Skip to content

Benchmark dashboard & observer

The dashboard (bench/dashboard, React on Bun, uPlot charts) is the north-star artifact — how we see every phase's progress. It has two views fed by one WebSocket stream from a node.

Benchmark view

A menu of scenarios and a Run button. Each tick streams: throughput (ops/sec), peak concurrent, latency p50/p95/p99, and a live throughput chart.

ScenarioHeadlineReal after
Spawn stormspawns/sec, memory/processphase 1
Message ping-pongmsgs/sec, round-trip latencyphase 2
Fault recoveryrestarts/sec, recovery latencyphase 3
Connection stormconns/sec, peak, latencyphase 5 (native), re-measured phase 6
Connection scale (held-open to the fd ceiling)peak concurrent connectionsphase 5
Fairness under tight loopbystanders keep progressingphase 6
Module storm (wasip1, Lunatic head-to-head)core-module spawns/secphase 6
Component stormcomponent spawns/secphase 7
Stream pipebytes/sec between processesphase 7
Distributed fan-outcross-node latencyphase 9
HTTP throughput + -ts twinreq/sec (co-resident live demo)phase 11
WS echo + -ts twinround-trips/sec (co-resident live demo)phase 11
SSE fan-out + -ts twinevents/sec (co-resident live demo)phase 11
KV stormdurable read-modify-writes/sec (ACID commits, redb)phase 11
Pub/sub fan-outsubscriber deliveries/sec (1 publish → N)phase 11
Crypto ops (TS)crypto.subtle SHA-256 digests/secphase 11

All nineteen scenarios above run real engines — none are synthetic (Runner::start_synthetic keeps a runtime-free deterministic preview only for UI development). The six serving scenarios are co-resident live demos: each spins up the same real in-process WASM server and drives it through the same load path as the out-of-process rusm-loadtest (balter for HTTP request-rate, a connection-capacity harness for WS/SSE held connections), with the load generator and server sharing the node process. Because they share CPU and hide the network behind loopback, the fair, credible headline numbers for serving are still the ones measured out-of-process by rusm-loadtest against a live rusm serve port (see serving HTTP/WS/SSE). The runtime micro-benchmarks (spawns/sec, msgs/sec, restarts/sec, scheduler fairness) stay in-process on purpose — they measure the actor core itself where there is no network/server, so in-process is the correct way to measure raw runtime capacity. The three platform-primitive scenarios are likewise honest in-process measurements of the capabilities a real app leans on: KV storm drives durable read-modify-writes against the embedded rusm-kv (redb) store — the only scenario that touches disk, so its number is the ACID-commit ceiling (writers serialise behind one commit lock; readers are concurrent MVCC); pub/sub fan-out is one publisher broadcasting to N subscriber processes (the exact mechanics of rusm-rs pubsub::Topics::publish); and crypto ops runs crypto.subtle (native RustCrypto) inside a sandboxed TypeScript guest, so its rate is the honest cost of offering Web Crypto from a JS component.

Live observer view

A real-time view of the node (à la Erlang observer): process count, running/waiting, scheduler load bars, total memory, and a per-instance table.

Observability must stay cheap

Counters are relaxed atomics; the node pushes a periodic aggregated snapshot (10–60 Hz), never an event per operation. The per-instance detail table is the only costly part of a snapshot, so it is toggleable — off for clean benchmark runs. We prove the overhead is negligible by running a high-rate benchmark observer-on vs observer-off (see the observer_overhead example).

Spawn-storm: how the number is produced (read this)

The spawn-storm is the first scenario on real data, so it's worth being precise about what its ops/sec means and why it's safe.

  • It's a continuous, multi-core storm. One background spawner task per (allowed) core hammers rusm-otpruntime.spawn(...) — as fast as it can. A single sequential loop would be capped by one core; a storm uses many. The tick just samples the achieved rate (Δspawned / Δt).
  • It measures create and reap. The spawned processes are trivial and finish immediately, so the rate reflects full lifecycle throughput, not just creation.
  • Backpressure is a safety net, not the operating point. Spawners pause if the live population ever reaches the in-flight cap, so the table can't grow without bound. But at every profile the population self-limits far below the cap (a few hundred live), because spawn rate and reap rate balance out — so "peak concurrent" reflects the real steady-state population, not a configured ceiling.
  • Throughput is reap-bound, so the lever is the spawner-to-reaper balance. The limit is how fast finished processes drain (~one reaper core's worth each). Too few spawners under-drives the machine; too many starve the reapers and pile processes up without going faster. The sweet spot is spawners ≈ reapers (~half the cores each) — that's what Max uses for peak smooth throughput.
  • memory shows 0. Native processes have no per-instance linear memory; that figure becomes real once processes are Wasm instances (Phase 6).

Resource profiles (the throughput dial)

A segmented control picks how hard the storm drives the machine. The spawn worker count is the dial and is relative to your CPU count; throughput rises with each tier. The in-flight cap is a uniform per-core safety net (memory can't run away), not a per-tier knob — the population self-limits well below it.

ProfileSpawn workersThroughput (busy 10-core box, release)Use it when
Light~¼ of cores~2.1M/sspeed isn't the point — leave the machine alone
Balanced (default)~⅖ of cores~2.4M/sgood throughput with visible room above
Max~½ of cores~2.8M/smost performant — peak sustained rate, still smooth

Max deliberately stops at ~half the cores: the other half reap, which is the sustained-throughput peak. Pushing spawners higher does not go faster — it just starves the reapers and piles processes up. So Max is the fastest profile and keeps the live population to a few hundred (no pile-up). The default is Balanced — fast, with headroom, and easy on the laptop. The tier (ResourceProfile) lives in rusm-node profile.rs; the benchmark's spawn-worker tuning for each tier is in rusm-bench profile_tuning.rs.

Protocol

The node and clients speak a small JSON protocol (rusm-bench protocol.rs, mirrored in bench/dashboard/src/types.ts):

  • Server → client: hello { scenarios, profiles }, tick { frame }, error { message }.
  • Client → server: run { scenario }, stop, set_observer_detail { enabled }, set_resource_profile { profile }.

A Frame carries the scenario, running flag, throughput, latency snapshot, observer snapshot, and the active resource profile. The dashboard folds messages into state with a pure reducer (state.ts) — fully unit-tested.

MIT licensed