Model tiering
Cheap where it can be, capable where it counts.
The most expensive habit in a multi-agent system is running routing and classification on a frontier model. One router table maps each agent to the smallest model that clears its bar. Retiering is a config edit, not a code change.
Proposed LLM and cloud stack, tentative
| Agent or job | Reasoning load | Tier |
|---|---|---|
| Supervisor, sentiment, intent, profile extraction | Trivial | Nova Micro |
| Generative UI payload, onboarding | Low to mid, structured | Nova Lite |
| Dr. Nila consultation | Mid to high | Claude Sonnet |
| Deep research synthesis | Highest, gated | Claude, deep tier |
| Voice session | Realtime speech | Nova Sonic |
Agents ask the router for a model by role and never name a model id. Escalate only when a cheaper tier demonstrably fails the bar.
Generative UI: the backend describes the screen
After a consultation turn, an agent emits a typed payload of components. The frontend holds a registry from component type to React component and mounts whatever arrives. One owner of the interface, a typed contract on both sides of the wire.
The payload is a typed schema on the backend; the frontend types are generated from it. A component the backend can emit is one the frontend knows how to render.
There is no free model
No Bedrock model is free. The cheapest is near zero per call but still draws account credit. The tiering target is near-zero cost on high-volume trivial work, not a free model, because none exists. A budget alarm guards the balance as a circuit breaker.