Architecture

agent-intelligence is a Go-first agentic platform: a single binary CLI + runtime server, with optional Python sidecar services for graph-heavy workloads. Agents communicate over A2A MCP REST and connect to graph backends locally (Kuzu) or in the cloud (Neo4j Aura).

System Overview

Particle streams show live message flow — green = requests, blue = responses, amber = model API calls.

┌─────────────────────────────────────────────────────────────────────────────────┐
  EXTERNAL LAYER                                                                
  ┌──────────────────┐    ┌───────────────────────┐    ┌─────────────────────┐  
    ai  (CLI)              IDE / Desktop             REST / curl        
    run serve graph        MCP stdio client         A2A peers          
  └────────┬─────────┘    └──────────┬────────────┘    └──────────┬──────────┘  
└───────────┼───────────────────────────┼──────────────────────────┼─────────────┘
                                                                 
            └───────────────────────────────────────────────────┘
                                          HTTP · MCP stdio · A2A JSON-RPC
┌──────────────────────────────────────────────────────────────────────────────┐
  AGENT RUNTIME  Go · net/http                          :8080 / :8081          
                                                                                  
  ┌──────────────────────────────────────────────────────────────────────────┐  
    A2A Server  ·  MCP Server :8081  ·  REST /api/*  ·  Web UI :8888           
  └───────────────────────────────────────────────────────────────────────┘  
  ┌───────────────────────────────────────────────────────────────────────┐  
    Agent Loop  · context window · token budget · multi-turn                
  └───────────────────────────────────────────────────────────────────┘  
                           MCP tool calls                  model calls      
  ┌─────────────────────────────────────────────┐  ┌──────────────────┐  
    MCP Client                                  Model Router       
    modelcontextprotocol/go-sdk                 Anthropic SDK     
  └─────────────────────────────────────────────┘    OpenAI-compat     
└──────────────────────────────────────────────────  └───────────────────┘  ┘
                            MCP stdio · HTTP                 HTTPS streaming
            ┌───────────────────────┐              
                                         ┌────────────────────────┐
┌─────────────────────┐  ┌──────────────┐    Anthropic · OpenAI      
  genai-toolbox           CypherMCP       claude-opus-4-6         
  :15000  Go              :15001 Go      gpt-4o · others        
└─────────────────────┘  └──────────────┘  └────────────────────────────┘
                                  
            └─────────────────────┘
                          Neo4j Bolt
┌──────────────────────────────────────────────────────────────────────────┐
  GRAPH BACKEND                                                              
  Kuzu    local · embedded · CGO · Cypher-compatible                       
  Neo4j Aura  cloud · Bolt · production · managed                          
└────────────────────────────────────────────────────────────────────────────┘
request flow (down) response flow (up) model API calls graph queries

Agent Runtime

The runtime is a single Go binary. One goroutine per session — 500+ concurrent sessions per instance. The agent loop manages context window, token budget, and multi-turn tool use.

A2A Server :8080

JSON-RPC 2.0 over HTTP + SSE streaming. Accepts tasks from CLI, other agents, and REST clients. Exposes /.well-known/agent.json for agent discovery.

MCP Server :8081

Exposes agent_run, agent_list as MCP tools. Skills registered as MCP Prompts. Supports stdio, SSE, and StreamableHTTP transports. Used by Claude Desktop, Cursor, etc.

Agent Loop

Core reasoning loop: receives task → assembles context → calls model → dispatches tool calls → injects results → repeats until done. Manages token budget with warn / compact / abort thresholds.

Model Router

Routes to Anthropic SDK (direct, preferred — preserves stop_reason) or OpenAI-compat endpoint. Supports fallback chains: primary model → cheaper fallback → local model.

MCP Client

Connects outbound to MCP tool servers. Uses modelcontextprotocol/go-sdk v1.0.0. Manages tool server subprocesses (genai-toolbox, CypherMCP). Per-session tool filtering via middleware.

Code Sandboxes

Tier 1: QuickJS→WASM via wazero (<5 ms cold start, all platforms). Tier 2: CPython→WASM for stdlib. Tier 3: Firecracker microVM for packages + shell (Linux/KVM only).

Local Mode

ai run agent.toml — single binary, zero infra. Kuzu embedded graph, sidecars launched on demand.

┌──────────────────────────────────────────────────────────────────────────┐
  YOUR MACHINE                                                             
                                                                            
  ┌──────────────────────────────────────────────────────────────────┐  
    ai  binary  <25 MB · <200 ms cold start · CGO_ENABLED=0          
                                                                      
    ┌────────────────────────────────────────────────────────────┐    
      Agent Runtime  +  MCP Client  +  Model Router               
    └───────────────────┬────────────────────────────────────────┘    
                         subprocess spawn                            
     ┌─────────────────┴────────────────────────────────────┐       
       CypherMCP :15001  ·  genai-toolbox :15000                
       Graph Build :8090  ·  GraphRAG :8091                      
       Memory :8092      ·  Eval Bridge :8093                      
     └───────────────────────────────────┬───────────────────┘       
                                           Bolt                       
     ┌───────────────────────────────────┴───────────────────┐       
       Kuzu  embedded · CGO · local .kuzu/ directory               
     └───────────────────────────────────────────────────────┘       
  └──────────────────────────────────────────────────────────────────┘  
                                                                            
  ↕  Anthropic / OpenAI API calls go outbound over HTTPS                   
└──────────────────────────────────────────────────────────────────────────┘

Cloud Mode

ai deploy — packages the runtime into a Docker image (<50 MB) and deploys to Fly.io. Neo4j Aura replaces Kuzu. Python sidecars run as companion containers.

┌────────────────────────────────────────────────────────────────────────────────┐
  CLOUDFLARE  DNS · CDN · TLS termination                                     
└───────────────────────────────────────────┬────────────────────────────────────┘
                                              HTTPS
┌───────────────────────────────────────────┼────────────────────────────────────┐
  FLY.IO MACHINE  per-region · shared-cpu-1x · 256 MB RAM                    
                                                                               
  ┌────────────────────────────────────────▼─────────────────────────────────┐  
    ai  runtime container  FROM scratch · <50 MB                            
    Agent Runtime  ·  A2A :8080  ·  MCP :8081  ·  REST /api/*               
  └────────────────────────────────────────────────────────────────────────┘  
                                                                                
  ┌──────────────────────────────────────────────────────────────────────────┐  
    Python sidecar container  (optional companion process)                   
    GraphRAG :8091  ·  Memory :8092  ·  Eval :8093                          
  └──────────────────────────────────────────────────────────────────────────┘  
└────────────────────────────────────────────────────────────────────────────────┘
                                                   
                            Neo4j Bolt+s             HTTPS streaming
┌─────────────────────────┴──────┐   ┌────────────────┴──────────────────────┐
  Neo4j Aura                        Anthropic / OpenAI APIs             
  managed · multi-region            claude-opus-4-6                    
└────────────────────────────────┘   └───────────────────────────────────────┘

CLI Command Flows

How each command routes through the system.

ai init my-agent ──▶ scaffold agent.toml ──▶ validate config ──▶ write .agent/ dir ai run agent.toml "find top 5 companies by revenue" ──▶ load + expand config ──▶ start runtime (in-process) ──▶ spawn sidecars ──▶ create session ──▶ agent loop ──▶ stream output to stdout ai serve --port 8080 ──▶ load config ──▶ start A2A :8080 + MCP :8081 + REST + Web :8888 ──▶ spawn MCP tool servers ──▶ health-check loop ──▶ ready ai graph build --source ./docs ──▶ start Graph Construction sidecar :8090 ──▶ poll /health ──▶ POST /build ──▶ stream ingestion progress ──▶ Kuzu / Aura ai eval --suite evals/ ──▶ start Eval Bridge sidecar :8093 ──▶ run eval harness ──▶ POST results to Opik / Arize ──▶ print score table ai deploy ──▶ make cross-compile linux/amd64 ──▶ docker build ──▶ fly deploy ──▶ tail logs ──▶ health-check https://<app>.fly.dev/health

Protocols

A2A — Agent-to-Agent

JSON-RPC 2.0 over HTTP + SSE streaming. Agents discover each other via /.well-known/agent.json. Task lifecycle: submit → working → done / failed. Supports multi-turn delegation between agents.

MCP — Model Context Protocol

Bidirectional: the runtime is both an MCP client (calls tool servers) and an MCP server (exposes agent capabilities). Three transports: stdio, SSE :8081, StreamableHTTP.

REST Management API

GET /api/agents, PUT /api/agents/:id, GET /api/health, GET /openapi.json. Used by the Web UI and external orchestrators. OTel spans emitted for every request.

Neo4j Bolt

Cypher over Bolt protocol to Kuzu (local) or Neo4j Aura (neo4j+s://). Managed by CypherMCP and Python sidecar services. genai-toolbox uses HTTP transport (no native Bolt from its tool runner).

Python Sidecars

Optional heavyweight services managed by the Go CLI via subprocess lifecycle. Each sidecar exposes a local HTTP API; the Go runtime polls /health every 250 ms and sends SIGTERM on shutdown (5 s deadline, then SIGKILL).

┌──────────────────────────────────────────────────────────────────────────────┐
  Go Runtime  spawn · health-check · SIGTERM                                 
└───────┬──────────────────┬─────────────────┬────────────────────┬────────────┘
                                                               
          HTTP             HTTP             HTTP               HTTP
┌───────┴──────┐  ┌───────┴──────┐  ┌───────┴──────┐  ┌───────┴──────────┐
  Graph Build      GraphRAG        Memory          Eval Bridge    
  :8090            :8091            :8092            :8093          
                                                                
  llm-graph-      neo4j-          agent-          Opik /        
  builder         graphrag        memory          Arize bridge  
└───────┬──────┘  └───────┬──────┘  └───────┬──────┘  └───────┬──────────┘
                                                               
        └──────────────────┴─────────────────┴────────────────────┘
                                         Neo4j Bolt
                           ┌───────────┴──────────────────┐
  Kuzu / Neo4j Aura          
                           └──────────────────────────────┘

Key ADRs

ADR-001 — Go framework

Custom assembly chosen over google/adk-go. Rationale: adk-go is immature; custom MCP + A2A gives full control with no hidden abstractions.

ADR-003 — Anthropic SDK direct

Use anthropic-sdk-go directly, not OpenAI-compat shim. Shim loses stop_reason precision needed for reliable tool-use detection.

ADR-005 — Split MCP libraries

modelcontextprotocol/go-sdk v1.0.0 for the MCP client (stable API). mark3labs/mcp-go v0.45 for the MCP server (more mature server API).

ADR-006 — Dual-role MCP

Runtime is simultaneously MCP client + server in one process. Per-session tool authorization via middleware (ToolFilterFunc is static — no context).

ADR-004 — Custom CypherMCP

genai-toolbox has no native Neo4j/Cypher source. Custom CypherMCP server (:15001) built with mark3labs/mcp-go to bridge Cypher queries.

Language strategy

Go for all runtime, CLI, and tool servers. Python for graph-heavy workloads (GraphRAG, graph construction, memory) — run as managed HTTP sidecar processes.