Companion to the paper "Price of Anarchy for LLM Inference Routing". It visualises how
a game-theoretic adaptive router (Saturation Detector + controller) reduces TTFT P99 at
saturation versus a static Dynamo baseline, on the paper's 340B 1P/2D deployment.
Watch the worker dots inside the Decode pool: under static, cache-affinity herds requests
onto one worker; under adaptive at saturation, the router flips (τ, ω) and the pool
rebalances. Separately, the Planner scales pool size via +/− on the pool planets —
the other half of paper §5's saturation decomposition.
How to use it
Mode tabs (top of stage) — jump between load regimes: idle → normal → busy
→ saturation.
Load slider (bottom) — drag to set concurrency manually from 0–100%.
Auto-demo — scripts an A/B contrast at saturation (static first, then adaptive kicks in, then Planner scales).
Static / Adaptive (left panel) — flip the controller to watch the
cost-of-not-adapting gap widen in the side chart.
Pool +/− controls (on the Prefill and Decode planets) — scale
the prefill/decode pool size (N_p, N_d). Dropping a count to 0 simulates a full outage; raising it
simulates the Planner adding capacity.
Side panel — live TTFT p99, P̂oA, worker dials, and mission log update per
tick.
IDLEOrbits are quiet. The Router's corona barely flickers. A few sentinel requests drift
between Frontend and the Sun — just enough to keep the control loop warm.