The orbits of Anarchy

Powered by NVIDIA Dynamo
Simulation calibrated to Nemotron‑4‑340B‑Instruct
?

About this simulation

Companion to the paper "Price of Anarchy for LLM Inference Routing". It visualises how a game-theoretic adaptive router (Saturation Detector + controller) reduces TTFT P99 at saturation versus a static Dynamo baseline, on the paper's 340B 1P/2D deployment.

Watch the worker dots inside the Decode pool: under static, cache-affinity herds requests onto one worker; under adaptive at saturation, the router flips (τ, ω) and the pool rebalances. Separately, the Planner scales pool size via +/− on the pool planets — the other half of paper §5's saturation decomposition.

How to use it

  • Mode tabs (top of stage) — jump between load regimes: idle → normal → busy → saturation.
  • Load slider (bottom) — drag to set concurrency manually from 0–100%.
  • Auto-demo — scripts an A/B contrast at saturation (static first, then adaptive kicks in, then Planner scales).
  • Static / Adaptive (left panel) — flip the controller to watch the cost-of-not-adapting gap widen in the side chart.
  • Pool +/− controls (on the Prefill and Decode planets) — scale the prefill/decode pool size (N_p, N_d). Dropping a count to 0 simulates a full outage; raising it simulates the Planner adding capacity.
  • Side panel — live TTFT p99, P̂oA, worker dials, and mission log update per tick.
IDLE Orbits are quiet. The Router's corona barely flickers. A few sentinel requests drift between Frontend and the Sun — just enough to keep the control loop warm.
THROUGHPUT
Controller · 340B 1P/2D
cj = ω·bpj + baj
τ 0.0
ω 1.0
BELOW
EWMA(TTFT P99) < θ₁
RPS
in-flight
queue
TTFT p50
GPUs (P/D)
 P̂oA
— SMART ROUTER — Planner CONTROL · ARIMA · KALMAN Saturation Detector helping the smart router BELOW stable · k=0/3 Prefill×1 N_p = 1 · util 0% nodes + Decode×2 N_d = 2 · util 0% nodes + KVBM MEMORY · G1–G4 · NIXL G1 HBM G2 DRAM G3 SSD G4 NET Frontend INGRESS · HTTP · gRPC

Shuttle manifest

Request · prefill
FE → SUN → PF
Request · decode
SUN → DEC
Tokens home
DEC → FE
KV freighter
PF ↔ KVBM ↔ DEC
Metrics probe
POOLS → PLN
Control signal
PLN → SUN, POOLS
0% load

Adaptive vs. static

TTFT p99
static
adaptive
P̂oA
static
adaptive
TTFT p99 vs. concurrency  ·  y: log scale
θ₁ 300ms θ₂ 2s C* 128 +1 Decode 30s 3s 300ms 30ms
C = 8128256
Dynamo (static) Adaptive cost of not adapting

Worker dials

Prefill
0%
Decode
0%

Mission log