Cheap where it can be, capable where it counts.

The most expensive habit in a multi-agent system is running routing and classification on a frontier model. One router table maps each agent to the smallest model that clears its bar. Retiering is a config edit, not a code change.

Proposed LLM and cloud stack, tentative

Agent or job	Reasoning load	Tier
Supervisor, sentiment, intent, profile extraction	Trivial	Nova Micro
Generative UI payload, onboarding	Low to mid, structured	Nova Lite
Dr. Nila consultation	Mid to high	Claude Sonnet
Deep research synthesis	Highest, gated	Claude, deep tier
Voice session	Realtime speech	Nova Sonic

Agents ask the router for a model by role and never name a model id. Escalate only when a cheaper tier demonstrably fails the bar.

AGUI, agent generated dynamic UI

Generative UI: the backend describes the screen

After a consultation turn, an agent emits a typed payload of components. The frontend holds a registry from component type to React component and mounts whatever arrives. One owner of the interface, a typed contract on both sides of the wire.

The payload is a typed schema on the backend; the frontend types are generated from it. A component the backend can emit is one the frontend knows how to render.

There is no free model

No Bedrock model is free. The cheapest is near zero per call but still draws account credit. The tiering target is near-zero cost on high-volume trivial work, not a free model, because none exists. A budget alarm guards the balance as a circuit breaker.

Right-sized models, and an interface the agents draw

Cheap where it can be, capable where it counts.

Generative UI: the backend describes the screen