Architecture
agent-intelligence is a Go-first agentic platform: a single binary CLI + runtime server, with optional Python sidecar services for graph-heavy workloads. Agents communicate over A2A MCP REST and connect to graph backends locally (Kuzu) or in the cloud (Neo4j Aura).
System Overview
Particle streams show live message flow — green = requests, blue = responses, amber = model API calls.
┌─────────────────────────────────────────────────────────────────────────────────┐ │ EXTERNAL LAYER │ │ ┌──────────────────┐ ┌───────────────────────┐ ┌─────────────────────┐ │ │ │ ai (CLI) │ │ IDE / Desktop │ │ REST / curl │ │ │ │ run serve graph │ │ MCP stdio client │ │ A2A peers │ │ │ └────────┬─────────┘ └──────────┬────────────┘ └──────────┬──────────┘ │ └───────────┼───────────────────────────┼──────────────────────────┼─────────────┘ │ │ │ └─────────────────────────┼──────────────────────────┘ │ HTTP · MCP stdio · A2A JSON-RPC ┌───────────────────────────────────────▼───────────────────────────────────────┐ │ AGENT RUNTIME Go · net/http :8080 / :8081 │ │ │ │ ┌──────────────────────────────────────────────────────────────────────────┐ │ │ │ A2A Server · MCP Server :8081 · REST /api/* · Web UI :8888 │ │ │ └──────────────────────────────────────┬─────────────────────────────────┘ │ │ ┌──────────────────────────────────────▼─────────────────────────────────┐ │ │ │ Agent Loop · context window · token budget · multi-turn │ │ │ └─────────────────────┬──────────────────────────────┬────────────────┘ │ │ │ MCP tool calls │ model calls │ │ ┌──────────────────────▼───────────────────────┐ ┌─▼─────────────────┐ │ │ │ MCP Client │ │ Model Router │ │ │ │ modelcontextprotocol/go-sdk │ │ Anthropic SDK │ │ │ └──────────────────────┬───────────────────────┘ │ OpenAI-compat │ │ └─────────────────────────┼───────────────────────── └──┬─────────────────┘ ┘ │ MCP stdio · HTTP │ HTTPS streaming ┌─────────────┴──────────┐ │ │ │ ┌──────▼──────────────────┐ ┌───────────▼──────────┐ ┌────────▼──────┐ │ Anthropic · OpenAI │ │ genai-toolbox │ │ CypherMCP │ │ claude-opus-4-6 │ │ :15000 Go │ │ :15001 Go │ │ gpt-4o · others │ └───────────┬──────────┘ └────────┬──────┘ └────────────────────────────┘ │ │ └───────────┬──────────┘ │ Neo4j Bolt ┌───────────────────────▼───────────────────────────────────────────────────┐ │ GRAPH BACKEND │ │ Kuzu local · embedded · CGO · Cypher-compatible │ │ Neo4j Aura cloud · Bolt · production · managed │ └────────────────────────────────────────────────────────────────────────────┘
Agent Runtime
The runtime is a single Go binary. One goroutine per session — 500+ concurrent sessions per instance. The agent loop manages context window, token budget, and multi-turn tool use.
A2A Server :8080
JSON-RPC 2.0 over HTTP + SSE streaming. Accepts tasks from CLI, other agents, and REST clients.
Exposes /.well-known/agent.json for agent discovery.
MCP Server :8081
Exposes agent_run, agent_list as MCP tools. Skills registered as MCP Prompts.
Supports stdio, SSE, and StreamableHTTP transports. Used by Claude Desktop, Cursor, etc.
Agent Loop
Core reasoning loop: receives task → assembles context → calls model → dispatches tool calls → injects results → repeats until done. Manages token budget with warn / compact / abort thresholds.
Model Router
Routes to Anthropic SDK (direct, preferred — preserves stop_reason) or OpenAI-compat
endpoint. Supports fallback chains: primary model → cheaper fallback → local model.
MCP Client
Connects outbound to MCP tool servers. Uses modelcontextprotocol/go-sdk v1.0.0.
Manages tool server subprocesses (genai-toolbox, CypherMCP). Per-session tool filtering via middleware.
Code Sandboxes
Tier 1: QuickJS→WASM via wazero (<5 ms cold start, all platforms). Tier 2: CPython→WASM for stdlib. Tier 3: Firecracker microVM for packages + shell (Linux/KVM only).
Local Mode
ai run agent.toml — single binary, zero infra. Kuzu embedded graph, sidecars launched on demand.
┌──────────────────────────────────────────────────────────────────────────┐ │ YOUR MACHINE │ │ │ │ ┌──────────────────────────────────────────────────────────────────┐ │ │ │ ai binary <25 MB · <200 ms cold start · CGO_ENABLED=0 │ │ │ │ │ │ │ │ ┌────────────────────────────────────────────────────────────┐ │ │ │ │ │ Agent Runtime + MCP Client + Model Router │ │ │ │ │ └───────────────────┬────────────────────────────────────────┘ │ │ │ │ │ subprocess spawn │ │ │ │ ┌─────────────────┴────────────────────────────────────┐ │ │ │ │ │ CypherMCP :15001 · genai-toolbox :15000 │ │ │ │ │ │ Graph Build :8090 · GraphRAG :8091 │ │ │ │ │ │ Memory :8092 · Eval Bridge :8093 │ │ │ │ │ └───────────────────────────────────┬───────────────────┘ │ │ │ │ │ Bolt │ │ │ │ ┌───────────────────────────────────┴───────────────────┐ │ │ │ │ │ Kuzu embedded · CGO · local .kuzu/ directory │ │ │ │ │ └───────────────────────────────────────────────────────┘ │ │ │ └──────────────────────────────────────────────────────────────────┘ │ │ │ │ ↕ Anthropic / OpenAI API calls go outbound over HTTPS │ └──────────────────────────────────────────────────────────────────────────┘
Cloud Mode
ai deploy — packages the runtime into a Docker image (<50 MB) and deploys to Fly.io.
Neo4j Aura replaces Kuzu. Python sidecars run as companion containers.
┌────────────────────────────────────────────────────────────────────────────────┐ │ CLOUDFLARE DNS · CDN · TLS termination │ └───────────────────────────────────────────┬────────────────────────────────────┘ │ HTTPS ┌───────────────────────────────────────────┼────────────────────────────────────┐ │ FLY.IO MACHINE per-region · shared-cpu-1x · 256 MB RAM │ │ │ │ │ ┌────────────────────────────────────────▼─────────────────────────────────┐ │ │ │ ai runtime container FROM scratch · <50 MB │ │ │ │ Agent Runtime · A2A :8080 · MCP :8081 · REST /api/* │ │ │ └────────────────────────────────────────────────────────────────────────┘ │ │ │ │ ┌──────────────────────────────────────────────────────────────────────────┐ │ │ │ Python sidecar container (optional companion process) │ │ │ │ GraphRAG :8091 · Memory :8092 · Eval :8093 │ │ │ └──────────────────────────────────────────────────────────────────────────┘ │ └────────────────────────────────────────────────────────────────────────────────┘ │ │ │ Neo4j Bolt+s │ HTTPS streaming ┌─────────────────────────┴──────┐ ┌────────────────┴──────────────────────┐ │ Neo4j Aura │ │ Anthropic / OpenAI APIs │ │ managed · multi-region │ │ claude-opus-4-6 │ └────────────────────────────────┘ └───────────────────────────────────────┘
CLI Command Flows
How each command routes through the system.
Protocols
A2A — Agent-to-Agent
JSON-RPC 2.0 over HTTP + SSE streaming. Agents discover each other via
/.well-known/agent.json. Task lifecycle: submit → working → done / failed.
Supports multi-turn delegation between agents.
MCP — Model Context Protocol
Bidirectional: the runtime is both an MCP client (calls tool servers)
and an MCP server (exposes agent capabilities). Three transports:
stdio, SSE :8081, StreamableHTTP.
REST Management API
GET /api/agents, PUT /api/agents/:id, GET /api/health,
GET /openapi.json. Used by the Web UI and external orchestrators.
OTel spans emitted for every request.
Neo4j Bolt
Cypher over Bolt protocol to Kuzu (local) or Neo4j Aura (neo4j+s://).
Managed by CypherMCP and Python sidecar services. genai-toolbox uses HTTP transport
(no native Bolt from its tool runner).
Python Sidecars
Optional heavyweight services managed by the Go CLI via subprocess lifecycle.
Each sidecar exposes a local HTTP API; the Go runtime polls /health
every 250 ms and sends SIGTERM on shutdown (5 s deadline, then SIGKILL).
┌──────────────────────────────────────────────────────────────────────────────┐ │ Go Runtime spawn · health-check · SIGTERM │ └───────┬──────────────────┬─────────────────┬────────────────────┬────────────┘ │ │ │ │ │ HTTP │ HTTP │ HTTP │ HTTP ┌───────┴──────┐ ┌───────┴──────┐ ┌───────┴──────┐ ┌───────┴──────────┐ │ Graph Build │ │ GraphRAG │ │ Memory │ │ Eval Bridge │ │ :8090 │ │ :8091 │ │ :8092 │ │ :8093 │ │ │ │ │ │ │ │ │ │ llm-graph- │ │ neo4j- │ │ agent- │ │ Opik / │ │ builder │ │ graphrag │ │ memory │ │ Arize bridge │ └───────┬──────┘ └───────┬──────┘ └───────┬──────┘ └───────┬──────────┘ │ │ │ │ └──────────────────┴─────────────────┴────────────────────┘ │ Neo4j Bolt ┌───────────┴──────────────────┐ │ Kuzu / Neo4j Aura │ └──────────────────────────────┘
Key ADRs
ADR-001 — Go framework
Custom assembly chosen over google/adk-go. Rationale: adk-go is immature;
custom MCP + A2A gives full control with no hidden abstractions.
ADR-003 — Anthropic SDK direct
Use anthropic-sdk-go directly, not OpenAI-compat shim.
Shim loses stop_reason precision needed for reliable tool-use detection.
ADR-005 — Split MCP libraries
modelcontextprotocol/go-sdk v1.0.0 for the MCP client (stable API).
mark3labs/mcp-go v0.45 for the MCP server (more mature server API).
ADR-006 — Dual-role MCP
Runtime is simultaneously MCP client + server in one process. Per-session tool authorization via middleware (ToolFilterFunc is static — no context).
ADR-004 — Custom CypherMCP
genai-toolbox has no native Neo4j/Cypher source. Custom CypherMCP
server (:15001) built with mark3labs/mcp-go to bridge Cypher queries.
Language strategy
Go for all runtime, CLI, and tool servers. Python for graph-heavy workloads (GraphRAG, graph construction, memory) — run as managed HTTP sidecar processes.