The fork that keeps it responsive.

Immediate questions run inline and stream back. Long questions offload to a queue and return a job id so the conversation never blocks. Voice holds a live duplex stream. The decision is deterministic where it can be, and consults a model only when genuinely ambiguous.

Indigo: inline path. Gold: durable offload, the job id returns immediately while the worker resumes the thread later. Teal: the live voice path.

Why the long path is a queue, not an await

Asynchronous request handling keeps one request from blocking a worker, but it does not survive a dropped socket, a restart, a deploy or a timeout. Tying a multi-minute research chain to the lifetime of a connection is the exact fragility this design avoids. Offload is a durability concern, solved by the queue and a job id one layer down, not by awaiting harder.

Determinism

Heuristic first, model only on doubt

Clear cases are settled by rule offline. The classifier consults a model only for genuinely ambiguous input, so the fork stays cheap and fast.

Value: the cheapest possible decision on the highest-volume step.

Safety of retries

Idempotency on the offload

The job id is paired with a key derived from the thread and the query. A retried submission or at-least-once delivery collapses to the same job.

Value: the same research never runs or bills twice.

One turn, three journeys, decided up front

The fork that keeps it responsive.

Why the long path is a queue, not an await

Heuristic first, model only on doubt

Idempotency on the offload