Inside AI Agents / for engineers and architects

The system behind the conversation

NilaGPT is a production-grade agentic system. These pages show the design choices and the reasoning, the way they were actually made, by a senior agentic AI architect. No migration narrative, just the architecture and why it is shaped this way.

System design

Two compute paths, one intelligence core.

The system splits by interaction shape, not by feature. A realtime path holds long-lived voice and streaming connections. An asynchronous path runs durable research as background work. Both converge on a single guardrailed model core. Everything sensitive stays inside the VPC.

Proposed LLM and cloud stack, tentative
CLIENT FrontendReact + Vite, S3 + CloudFront AGUI renderermounts agent payloads Cognitoper-user identity EDGE HTTP APIsubmit, 202 + job id WebSocket APIstream + voice ASYNC PATH / durable research Enqueue Lambdavalidate, dedup SQS FIFO+ DLQ Research workerresume thread DynamoDBjob + result REALTIME PATH / voice + stream ECS Fargate serviceLangGraph runtime + Nova Sonic duplex Long-lived connection. The reason this is Fargate, not Lambda. INTELLIGENCE CORE / inside the VPC Bedrock + Guardrailsmodel router, PII, toxicity research-mcpweb search, KB Langfuse, X-Ray, CloudWatchself-hosted tracing
Gold: the durable async research path. Teal: the realtime voice and stream path. Both resolve to one guardrailed Bedrock core that never leaves the VPC.

The choices that define it

Compute topology

Fargate for realtime, serverless for async

Only the duplex voice and stream loop needs a connection that outlives a request, so only that runs on an always-on task. Research is durable, so it runs on a queue and functions.

Value: responsive conversation, durable work, minimal idle spend.
Durability

A queue and a job id, never an awaited call

Long research returns a job id at once and resumes the thread later. Awaiting would tie multi-minute work to the life of a socket. Offload is a durability problem, solved one layer down.

Value: a four-minute task survives a dropped connection or a deploy.
Network posture

Closed by default, one audited door

All AWS access is over VPC endpoints. The single egress is web search for research, behind an allowlist, after PII redaction. Application code stays endpoint agnostic.

Value: sensitive data stays in the VPC; the one open path is the watched one.
Tool boundary

MCP for research, blackboard for the rest

One MCP server exposes web search and retrieval, shared by the inline agent and the worker. Internal orchestration stays on the graph, where a protocol boundary would add cost and no consumer.

Value: one research implementation, two consumers, a clean seam for new tools.
Designed for

An audience that includes minors. That single fact drives the closed network, the guardrails on every call, the self-hosted tracing, and the per-user isolation. Safety is the architecture, not a layer on top of it.