Two compute paths, one intelligence core.
The system splits by interaction shape, not by feature. A realtime path holds long-lived voice and streaming connections. An asynchronous path runs durable research as background work. Both converge on a single guardrailed model core. Everything sensitive stays inside the VPC.
The choices that define it
Fargate for realtime, serverless for async
Only the duplex voice and stream loop needs a connection that outlives a request, so only that runs on an always-on task. Research is durable, so it runs on a queue and functions.
Value: responsive conversation, durable work, minimal idle spend.A queue and a job id, never an awaited call
Long research returns a job id at once and resumes the thread later. Awaiting would tie multi-minute work to the life of a socket. Offload is a durability problem, solved one layer down.
Value: a four-minute task survives a dropped connection or a deploy.Closed by default, one audited door
All AWS access is over VPC endpoints. The single egress is web search for research, behind an allowlist, after PII redaction. Application code stays endpoint agnostic.
Value: sensitive data stays in the VPC; the one open path is the watched one.MCP for research, blackboard for the rest
One MCP server exposes web search and retrieval, shared by the inline agent and the worker. Internal orchestration stays on the graph, where a protocol boundary would add cost and no consumer.
Value: one research implementation, two consumers, a clean seam for new tools.An audience that includes minors. That single fact drives the closed network, the guardrails on every call, the self-hosted tracing, and the per-user isolation. Safety is the architecture, not a layer on top of it.