Building MiuBot - A Personal AI Assistant From Nanobot to Production

Feb 20 2026 AI agentic-ai

After months of studying agentic AI patterns in theory - agent loops, tool calling, multi-agent coordination - I wanted to build something real. Not another tutorial project, but an AI assistant I could actually use daily, connected to the chat platforms I already live on. That’s how MiuBot started - forked from Nanobot, then reshaped into something quite different.

Why Nanobot as a Starting Point

I wasn’t starting from scratch. I’d spent months learning agent concepts through Python frameworks - LangChain, LangGraph, and the patterns covered in my agentic AI series. But I wanted a standalone assistant - something that runs as its own process, connects to chat platforms, and handles real conversations with tool use.

Nanobot caught my attention: a lightweight Python assistant with a clean agent loop, tool calling, and chat integrations for Telegram, Discord, and several other platforms. File-based storage, single-process, JSON sessions - simple and readable. More importantly, it was small enough to understand completely in a few hours. That matters when you plan to rip it apart and rebuild it.

I also studied OpenClaw (Node.js/TypeScript) which runs as a WebSocket RPC gateway with multi-channel messaging. Its architecture gave me ideas about how a personal assistant should be structured - not as a library you import, but as a gateway that sits between users and LLMs. I adopted OpenClaw’s SKILL.md format for the skills system and even changed MiuBot’s default port to 18790 to avoid conflicting with OpenClaw’s port.

From Fork to Rebuild

The fork started on February 1, 2026. Within three weeks, almost nothing from the original Nanobot remained. Here’s the transformation:

Aspect	Nanobot	MiuBot
Storage	JSON files	PostgreSQL + pgvector (30+ tables)
Durability	Event-driven, in-memory	Temporal workflows (distributed, durable)
Tenancy	Single-user	Multi-tenant with Gateway/Worker split
Memory	Basic conversation history	3-tier BASB (Active/Reference/Archive)
Scalability	Single process	Horizontal: K8s HPA, worker pools
Observability	Logs only	OpenTelemetry tracing, metrics, cost tracking
IDs	Mixed formats	UUIDv7 everywhere

The Agent Loop

Nanobot had a straightforward loop: receive message, call LLM, execute tools, respond. MiuBot keeps the same Think-Act-Observe cycle from Anatomy of an AI Agent, but adds production guardrails:

class AgentLoop:
    async def _run_agent_loop(self, initial_messages, channel, chat_id, ...):
        messages = initial_messages
        iteration = 0

        while iteration < self.max_iterations:
            iteration += 1

            try:
                response = await asyncio.wait_for(
                    self.provider.chat(
                        messages=messages,
                        tools=self.tools.get_definitions(),
                        model=self.model,
                    ),
                    timeout=180,
                )
            except asyncio.TimeoutError:
                break  # Don't hang forever

            if response.has_tool_calls:
                # Execute tools, append results, continue loop
                for tool_call in response.tool_calls:
                    result = await self.tools.execute(
                        tool_call.name, tool_call.arguments
                    )
                    messages = self.context.add_tool_result(
                        messages, tool_call.id, tool_call.name, result
                    )
            else:
                final_content = response.content
                break

        return final_content, tools_used, total_usage, messages

The key additions: 180-second timeout per LLM call, per-session locks to prevent race conditions on tool routing state, and usage tracking that accumulates tokens across iterations. None of these exist in a tutorial implementation.

Per-Session Concurrency

The original Nanobot processed messages sequentially. One user waiting for a long tool execution blocks everyone. MiuBot dispatches each session to its own asyncio task with a dedicated queue:

# Per-session processing: one concurrent task per session key
self._session_tasks: dict[str, asyncio.Task] = {}
self._session_queues: dict[str, asyncio.Queue] = {}
self._session_locks: dict[str, asyncio.Lock] = {}

Each chat session (identified by channel + chat ID) gets its own worker task. Messages queue up per-session, so Telegram user A and Discord user B run concurrently, but rapid messages from the same user serialize correctly.

Side-Effect Dedup and Tool Loop Detection

A problem I hit early: the LLM sometimes calls the same tool twice in a loop, or retries a side-effect tool (like sending a message) that already succeeded. MiuBot detects this with SHA256 hashing of (tool_name, args) for state-changing tools:

Tools prefixed with create_, update_, delete_, send_ are tracked as side-effects
Identical calls within the same loop iteration are deduplicated
A per-tool cap (MAX_SAME_TOOL_CALLS=3) prevents infinite loops
Read-only tools (list, get, search) are exempt

This saved real money during early testing when the agent got stuck in a loop calling the same web search 10 times.

10 Chat Channels

This was the biggest investment. Nanobot already supported several platforms. MiuBot connects to 10:

Channel	Transport	Public IP Required
Telegram	Long polling	No
Discord	WebSocket gateway	No
WhatsApp	Node.js bridge (WebSocket)	No
Feishu	WebSocket long connection	No
DingTalk	Stream mode	No
Slack	Socket mode	No
Email	IMAP polling + SMTP	No
QQ	botpy SDK (WebSocket)	No
Zalo	ZCA-CLI WebSocket bridge	No
Mochat	Socket.IO + msgpack	No

A deliberate design choice: no channel requires a public IP. Every platform uses either long polling, WebSocket, or a bridge process. This means MiuBot runs behind a NAT, on a home server, or inside a corporate network without port forwarding.

Each channel implements a base interface with platform-specific formatting rules. Zalo doesn’t support Markdown, so the context builder injects formatting constraints into the system prompt:

_ZALO_FORMATTING_RULES = (
    "\n\nFORMATTING RULES (MANDATORY):"
    "\n- Zalo does NOT support markdown. NEVER use: ## headings, "
    "**bold**, *italic*, `code`, tables, or > quotes."
    "\n- Use plain text only: VIET HOA for headings, bullet '-' for lists."
)

This is a pattern I didn’t see in any tutorial: channel-aware prompt injection - the system prompt adapts based on where the message came from.

3-Tier Memory: BASB-Inspired

In Agent State and Memory, I explored how agents maintain context across conversations. MiuBot implements a memory system inspired by Building a Second Brain (BASB) with three tiers:

Active - Short-term memories extracted from daily conversations. Things like “user prefers Vietnamese for casual chat” or “working on the API migration project.”

Reference - Weekly consolidation compresses daily notes into durable insights. An LLM reviews the week’s active memories and produces structured reference entries.

Archive - Monthly consolidation compresses reference memories further. Long-term storage for context that rarely changes.

flowchart LR
    A["Daily\nConversations"] --> B["Daily\nConsolidation"]
    B --> C["Active\nMemories"]
    C --> D["Weekly\nConsolidation"]
    D --> E["Reference\nMemories"]
    E --> F["Monthly\nConsolidation"]
    F --> G["Archive\nMemories"]

    classDef blueClass fill:#4A90E2,stroke:#333,stroke-width:2px,color:#fff
    classDef orangeClass fill:#F39C12,stroke:#333,stroke-width:2px,color:#fff
    classDef greenClass fill:#27AE60,stroke:#333,stroke-width:2px,color:#fff
    class A,B blueClass
    class C,D orangeClass
    class E,F,G greenClass

Each consolidation job uses a distributed advisory lock in PostgreSQL to prevent duplicate processing when multiple workers run:

lock_key = int.from_bytes(
    hashlib.sha256(f"daily:{ws.id}".encode()).digest()[:8],
    "big", signed=True,
)
locked = await conn.fetchval(
    "SELECT pg_try_advisory_lock($1)", lock_key
)
if not locked:
    return {"status": "skipped", "reason": "locked"}

The consolidation runs as a Temporal scheduled workflow - not a cron job. Temporal gives us exactly-once execution, automatic retry on failure, and visibility into what happened and when.

Durable Workflows with Temporal

This is where MiuBot diverges most from typical Python assistant projects. Instead of processing messages in a single async loop, MiuBot uses Temporal for durable workflow orchestration:

@workflow.defn
class BotSessionWorkflow:
    """Durable per-session workflow. One instance per (workspace, session)."""

    @workflow.signal
    async def new_message(self, msg: dict) -> None:
        self._pending_messages.append(msg)

    @workflow.run
    async def run(self, session_info: dict) -> None:
        while True:
            await workflow.wait_condition(
                lambda: len(self._pending_messages) > 0
            )
            msg = self._pending_messages.pop(0)
            result = await workflow.execute_activity(
                "process_message_activity",
                args=[msg, session_info],
                start_to_close_timeout=timedelta(minutes=10),
                heartbeat_timeout=timedelta(minutes=5),
            )

Each conversation session is a long-running Temporal workflow. Incoming messages are signals, processing happens as activities. If the worker crashes mid-response, Temporal retries the activity automatically. The workflow survives process restarts.

This also enables the gateway/worker split:

flowchart LR
    A["Telegram"] --> GW["Gateway\n(routing)"]
    B["Discord"] --> GW
    C["Zalo"] --> GW
    GW --> T["Temporal\nServer"]
    T --> W1["Worker 1\n(bot: assistant)"]
    T --> W2["Worker 2\n(bot: researcher)"]

    classDef blueClass fill:#4A90E2,stroke:#333,stroke-width:2px,color:#fff
    classDef orangeClass fill:#F39C12,stroke:#333,stroke-width:2px,color:#fff
    classDef greenClass fill:#27AE60,stroke:#333,stroke-width:2px,color:#fff
    class A,B,C blueClass
    class GW,T orangeClass
    class W1,W2 greenClass

The gateway handles channel connections and message routing. Workers process messages and call LLMs. You can scale workers independently, filter by bot name, and deploy different workers with different resource allocations.

Multi-Tenant Workspaces

What started as a personal project became multi-tenant. MiuBot supports isolated workspaces - each with its own bots, providers, channels, and memory:

# bots.yaml - multi-bot configuration
bots:
  assistant:
    soul: /path/to/soul.md
    provider:
      model: anthropic/claude-sonnet-4-5-20250929
      api_key_env: ANTHROPIC_API_KEY
    channels:
      telegram:
        token_env: ASSISTANT_TG_TOKEN
    jobs:
      morning_briefing:
        schedule: "0 8 * * *"
        timezone: "Asia/Saigon"
        prompt: "Summarize top news"
        targets:
          - channel: telegram
            chat_id_env: NEWS_CHAT_ID

Each bot gets its own identity (“soul” file), provider configuration, channel tokens, and scheduled jobs. The workspace resolver routes incoming messages to the correct bot based on channel token matching.

MCP Integration

Model Context Protocol (MCP) support was straightforward since the config format is compatible with Claude Desktop and Cursor:

{
  "tools": {
    "mcpServers": {
      "filesystem": {
        "command": "npx",
        "args": ["-y", "@modelcontextprotocol/server-filesystem", "/path"]
      }
    }
  }
}

MCP servers connect lazily on first message and register their tools into the same registry as built-in tools. The agent doesn’t distinguish between native tools and MCP tools - they all implement the same interface.

13+ LLM Providers

Rather than building provider-specific clients, MiuBot uses a registry pattern with LiteLLM as the underlying abstraction:

ProviderSpec(
    name="openrouter",
    keywords=("openrouter",),
    env_key="OPENROUTER_API_KEY",
    display_name="OpenRouter",
    litellm_prefix="openrouter",
    skip_prefixes=("openrouter/",),
)

Adding a new provider is two steps: add a ProviderSpec and a config field. Everything else - environment variables, model prefixing, status display - works automatically. This supports OpenRouter, Anthropic, OpenAI, DeepSeek, Groq, Gemini, MiniMax, DashScope, Moonshot, Zhipu, AIHubMix, custom endpoints, and local vLLM servers.

Subagent System

Following the patterns from Multi-Agent Architecture, MiuBot supports spawning subagents for background tasks:

class SubagentManager:
    """Lightweight agent instances that run in the background."""

    async def spawn(self, task: str, label: str | None = None, ...):
        # Creates isolated context with focused system prompt
        # Shares LLM provider but gets restricted tool set
        # Results are delivered back via the message bus

Subagents share the LLM provider but get isolated context and a reduced tool set. They can’t spawn further subagents (no infinite delegation). Results flow back through the message bus, so the parent agent or the user gets notified when the background task completes.

Skills System

Adopted from OpenClaw’s SKILL.md format, MiuBot’s skills system uses a 5-tier hierarchy:

Workspace - team-level skills shared across bots
Project - project-specific skills
Personal - per-user skills
Global - system-wide defaults
Builtin - hardcoded essentials

Skills are Markdown files with optional YAML frontmatter. The system uses BM25 keyword search to automatically inject relevant skills into the agent’s context based on the user’s message - no manual tagging or routing required. A file watcher with 500ms debounce enables hot-reloading during development.

Bot Identity

Each bot in a workspace has a rich identity system through context files:

File	Purpose
`SOUL.md`	Core personality, values, communication style
`USER.md`	Knowledge about the bot’s user/owner
`AGENTS.md`	Available subagents and delegation rules
`IDENTITY.md`	Name, role, backstory
`TOOLS.md`	Tool usage guidelines and restrictions
`MEMORY.md`	Memory management instructions
`HEARTBEAT.md`	Proactive wake-up rules
`BOOTSTRAP.md`	First-run ritual (auto-deleted after 3 turns)

The BOOTSTRAP.md pattern is interesting - it runs a first-run onboarding conversation where the bot learns about its user, then deletes the file so it never triggers again. The knowledge gets saved to memory instead.

What I Learned

Building MiuBot taught me things that studying patterns in isolation doesn’t:

Channel diversity is a user experience problem, not just a technical one. Each platform has different formatting rules, message length limits, media handling, and user expectations. The system prompt needs to adapt per-channel.

Per-session concurrency is harder than per-request. HTTP APIs process independent requests. Chat sessions have state, ordering requirements, and the possibility of a user sending 5 messages while the agent processes the first one.

Memory consolidation needs infrastructure. Running an LLM to summarize yesterday’s conversations sounds simple. Doing it reliably across multiple workspaces with exactly-once guarantees requires Temporal or something equivalent.

Python’s async works well enough for this scale. For a personal assistant serving a handful of users across multiple channels, Python’s asyncio handles the concurrency fine. The bottleneck is always the LLM API call, not the framework.

That last point is important. Python works well for this use case. But as I started thinking about scaling beyond personal use - many users, concurrent WebSocket sessions, streaming responses - I started looking at what a Go implementation would look like. That led me to GoClaw, which takes the same gateway concept much further with Go’s concurrency primitives. More on that in the next post.

Key Takeaways

Start with something that works, then reshape it. Nanobot gave me a running agent loop in hours. The rebuild happened incrementally over three weeks while the bot was already useful.
Chat platforms are the real integration challenge. The agent loop is 50 lines. Getting Zalo, WhatsApp, and Telegram to work reliably with proper formatting is 10x more code.
Temporal changes how you think about reliability. Once message processing is a durable workflow, crashes and restarts stop being scary. The workflow picks up where it left off.
BASB-inspired memory consolidation works. Daily/weekly/monthly compression keeps long-term context manageable without unbounded growth.
MCP makes tool integration trivial. Compatible config format with Claude Desktop means you can copy-paste server configs and they just work.
Python is the right choice for a personal AI assistant. The ecosystem (LiteLLM, Temporal SDK, asyncio, Pydantic) is mature and well-documented. The performance ceiling is high enough for personal/small-team use.

MiuBot is open source at GitHub. If you’re interested in how these concepts translate to a Go-based production gateway, see the next post: From Theory to Gateway - Why Go Makes Sense for Production AI Agents.

#llm #python #agents #multi-agent #production

Building MiuBot - A Personal AI Assistant From Nanobot to Production

Why Nanobot as a Starting Point

From Fork to Rebuild

The Agent Loop

Per-Session Concurrency

Side-Effect Dedup and Tool Loop Detection

10 Chat Channels

3-Tier Memory: BASB-Inspired

Durable Workflows with Temporal

Multi-Tenant Workspaces

MCP Integration

13+ LLM Providers

Subagent System

Skills System

Bot Identity

What I Learned

Key Takeaways

Comments

Your browser is out-of-date!

Building MiuBot - A Personal AI Assistant From Nanobot to Production

Why Nanobot as a Starting Point

From Fork to Rebuild

The Agent Loop

Per-Session Concurrency

Side-Effect Dedup and Tool Loop Detection

10 Chat Channels

3-Tier Memory: BASB-Inspired

Durable Workflows with Temporal

Multi-Tenant Workspaces

MCP Integration

13+ LLM Providers

Subagent System

Skills System

Bot Identity

What I Learned

Key Takeaways

Related Posts

Comments

Your browser is out-of-date!