Roadmap — TDD baby steps
Each phase writes the failing test first, then implements until green, and leaves cargo test passing. Every phase "graduates" a dashboard scenario from synthetic data to real measurements.
Foundation-first ordering. The Erlang model — processes, messaging, supervision/fault-tolerance, management, connectivity — is the foundation and comes first, built on native Rust process bodies so it's real and measurable early. Wasmtime is the execution backend, slotted in at Phase 6: the actor layer is designed wasm-ready, so swapping a process body from a native closure to a sandboxed Wasm instance is additive, not a rewrite. That's also when "task-level" fault isolation becomes "true memory isolation".
Crate mapping: Phases 1–5 build the Wasm-free OTP core (
rusm-otp— usable standalone); Phase 6 adds therusm-wasmbackend; therusmruntime composes them. The OTP layer is all of Phases 1–5, not just Phase 1.
| Phase | Theme | Graduates to real data |
|---|---|---|
| 0 ✅ | Observability + benchmark dashboard (synthetic) | — |
| 1 ✅ | Process & scheduler core — task + process table + abort-based lifecycle, pluggable body | spawn-storm (live) |
| 2 ✅ | Mailboxes & message passing — per-process mailbox, send/recv, selective recv_match | ping-pong (live) |
| 3 ✅ | Links, monitors, supervision, fault tolerance — exit reasons, link/monitor/trap_exit/spawn_link/exit, cascades | fault-recovery (live) |
| 4 ✅ | Process management — named registry, timers (send_after/cancel), graceful shutdown | — |
| 5 ✅ | Connectivity: TCP — listen/connect, process-per-connection (TLS folds into the Phase 9 secure cluster transport) | connection-storm (live) |
| 6 ✅ | Embed Wasmtime as the process backend — instance-per-process, host ABI, epoch preemption, pooling + CoW + InstancePre; fairness graduated to real Wasm | fairness (live) |
| 7 ✅ | Component hosting — the component model (WASI p2 + p3) via bridges/, a rusm:runtime WIT actor world (self/send/receive/list/info/kill/register), default-deny capability profiles + memory limits, process introspection, byte streams, and an app model (rusm.toml [components.<name>], rusm build/dev). ~440k component spawns/s | component-storm (live) |
| 8 ✅ | Guest ergonomics — rusm-ts (TS/Bun → the rquickjs js-runner, no jco): rusm build bundles each TS component with Bun → wasm/<name>.js; service components export functions (RUSM runs the receive→dispatch→reply loop) and a worker exports default; the concealed typed client spawn<Svc>("svc") makes a cross-process call read like await svc.method(...) — plus for await streaming of generator handlers and callback args (a function stays in the caller, its invocations routed back) — over capability-gated spawn-from-guest (each component runs under its own declared profile); async Process API, binary messages + byte streams, all typed by the importable rusm-ts npm package (import { Process, spawn } from "rusm-ts"). rusm-rs (the Rust twin): ergonomic Pid/send/receive (serde JSON, same wire as TS)/spawn/registry/Stream (wit-bindgen library/binary split) + a #[rusm_rs::service] macro → a dispatch loop + a typed Client with call/cast/streaming/callbacks. A Rust client and a TS service interoperate. Both get an in-guest Supervisor (one-for-one / one-for-all / rest-for-one over a monitor ABI), and rusm dev watches ./components and rebuilds + reloads on edit. | — |
| 9 ✅ | Distributed clusters + live attach — the Wasm-free rusm-cluster crate over rusm-otp: QUIC+TLS nodes, cross-node send, a gossiped global registry (register_global/send_global), remote spawn (named factories) and live attach (remote_pids) over one control-plane RPC. ~550k cross-node msgs/s, ~39µs p50 round-trip (loopback). | distributed-fanout (live) |
| 10 ✅ | Scale & hardening — not raw speed (throughput/latency is already at the isolation-model ceiling: ~440k component spawns/s, ~21M msgs/s). On-demand instance tier (WasmRuntime::with_overflow — when the pooled cap is full, spawn from an on-demand engine so the live Wasm-process count is bounded by available memory, not a compile-time pool size); opt-in bounded mailboxes (Runtime::with_mailbox_capacity — shed user messages past capacity, system signals never shed); cluster security hardening (ClusterCa issues per-node certs, mutual TLS, foreign-CA peers rejected — replacing Phase 9's pre-shared cert); supervisor restart-intensity (windowed {max_restarts, max_seconds} in both in-guest supervisors). All with no spawn/message regression. See the design analysis. | — |
| 11 ✅ | Standard-WASI surface & serving — serve HTTP / WS / SSE from a component (rusm serve + rusm.toml [[serve]]); host wasi:http inbound and outbound (a capability-gated, streaming fetch for guests); run stock wasi:cli/run command components unchanged (WasmRuntime::spawn_command); support wstd guests. The TS runner is hardened for real apps — crypto (getRandomValues/randomUUID over wasi:random) + a real streaming fetch/Response/AbortController. Byte streams work over a handle ABI (load-bearing for WS frame delivery + SSE bodies); a native p3-typed stream<u8> signature is the one deferred refinement — cosmetic, since the handle ABI is functionally complete. | http/ws/sse (live) |
| 12 | Edge & cluster hardening — close the security-audit exposures before RUSM faces untrusted traffic. Serve-path admission control: a bound on concurrent in-flight instances + a request-body size cap + a per-request wall-clock timeout (epoch preemption only bounds CPU; a slow-loris body or connection flood is unbounded today), degrading to a graceful 503/close rather than leaning on the pool cap. Bounded serve-path mailboxes by default so a fast peer can't OOM a slow WS handler (the opt-in with_mailbox_capacity exists; the serve path doesn't set it). Serving TLS (https/wss) — moved here from Phase 11. Cluster trust hardening: signed name→node ownership so a mutually-authenticated but malicious peer can't poison the gossiped global registry (mTLS authenticates the connection, not the gossip payload), plus poison-resistant locking in the rusm-cluster control loop (replace lock().unwrap()). The sandbox, capability model, trap isolation, supervision, and mailbox load-shedding are already sound — these are network-edge and peer-trust gaps, not sandbox breaks. | — |
What's shipped so far (Phases 0–11)
- Phase 0 — observability & harness:
rusm-metrics,rusm-observer,rusm-bench(+ WebSocket server),rusm-cli, the React dashboard, and this docs/VitePress site. - Phases 1–5 — the Wasm-free OTP core (
rusm-otp): process & scheduler core, mailboxes & message passing, links/monitors/supervision, the named registry + timers + graceful shutdown, and TCP (process-per-connection). - Phase 6 — Wasmtime backend (
rusm-wasm): instance-per-process, host ABI, epoch preemption, pooling + CoW +InstancePre. - Phase 7 — component hosting: the component model (WASI p2 + p3) via
bridges/{wasip1,wasip2,wasip3}, therusm:runtimeWIT actor world, default-deny capabilities, cross-process byte streaming, and therusm.tomlapp model. - Phase 8 — guest ergonomics:
rusm-ts(TS/Bun → js-runner) +rusm-rs(the Rust twin), service components + a concealed typed client (call/cast/streaming/ callbacks), spawn-from-guest, an in-guestSupervisor, andrusm devreload. - Phase 9 — distributed clusters (
rusm-cluster): QUIC+TLS nodes, cross-nodesend, a gossiped global registry, remote spawn, and live attach — overrusm-otp, Wasm-free. - Phase 10 — scale & hardening: on-demand instance tier (lift the pooled cap), opt-in bounded mailboxes (overload load-shed), per-node certs under a cluster CA
- mutual TLS, and windowed supervisor restart-intensity — no spawn/message regression.
- Phase 11 — serving & standard-WASI surface (functionally complete; one refinement deferred): run a component as a high-throughput HTTP / WS / SSE server. The serving engine is built and measured —
WasmRuntime::http_server(instance-per-requestwasi:http),ws_server(one sandboxed component process per WebSocket connection, replies via a Wasm-free writer process), and SSE (awasi:httpstreaming body).rusm servenow hostsrusm.toml [[serve]]listeners (protocol=http|sse|ws,listen, and — for HTTP/SSE — a[serve.routes]table naming[components.<name>]handlers) on real TCP ports, loadingwasm/<name>.{wasm,js}(HTTP/SSE via thehttp_serverpath, WS viaws_server);rusm new <name>scaffolds a ready-to-serve TS HTTP app. Serving is benchmarked two ways: the fair, credible headline numbers come out-of-process from therusm-loadtestbinary against a liverusm serveport — loopback: HTTP ~46k req/s (0% errors), WS ~146k round-trips/s (256 held), SSE ~609k events/s (256 held), and ~34k sandboxed-process-per-connection WS establishments/s (rusm-loadtest'sconnmode). The six serving dashboard tiles (http-throughput,ws-echo,sse-fanoutand their*-tstwins) are co-resident live demos — the same real in-process WASM server driven through the sharedrusm-loadtestpath (a steady closed-loop driver for HTTP — a fixed set of outstanding requests that holds the tile at the server's real ceiling, never flooding or collapsing — and a connection-capacity harness for WS/SSE held connections), with load generator and server sharing the node process. (The fair out-of-process headline still uses balter's rate sweep.) See serving HTTP/WS/SSE. Serving TLS and the edge-hardening it pairs with move to Phase 12 (below). Beyond serving, Phase 11 closed the standard-WASI surface: stockwasi:cli/runcommand components run unchanged (WasmRuntime::spawn_command), and the TS runner gained a capability-gated, streaming outboundfetch(overwasi:http, TLS viawasmtime-wasi-http) pluscrypto(getRandomValues/randomUUID overwasi:random) — enough to host real npm clients such asai-sdk. The one deferred item is a native p3-typedstream<u8>actor-world signature: cosmetic standards-polish, since the handle-ABI byte streams are functionally complete and load-bearing for WS/SSE serving.rusm-otpstays Wasm-free (hyper/tungstenite/wasi:httplive only inrusm-wasm).
Planned
Security audit (Phase 12 scope)
A security/robustness review found the core sound: the capability sandbox is true default-deny, spawn-from-guest is capability-gated (a node-registered component runs under its own manifest-declared profile; a guest can't fabricate capabilities the operator never granted), a guest trap is isolated to that one process (no scheduler/runtime crash, no cascade), TS guests are sandboxed exactly like Rust ones (rquickjs compiled to wasm32-wasip2), the rusm-otp core is panic-isolated (sharded DashMap, Drop-guard reaping), cluster transport enforces mutual TLS (foreign-CA peers rejected), remote spawn is gated to pre-registered factories, supervisor restart-intensity is windowed, and bounded mailboxes shed only user messages (system/exit signals are never dropped). No sandbox escapes or privilege escalation.
The gaps are operational (DoS hardening) and trust-model, not architectural — and become Phase 12:
- Serve-path admission control (medium).
HttpServer/WsServeraccept per connection/request with no concurrency bound, no request-body cap, and no per-request timeout (epoch only bounds CPU). It degrades gracefully at the pool cap — verified, not a panic — but with the on-demand overflow tier a flood is memory-bounded, not count-bounded. Add a semaphore + body cap + request timeout → graceful503. - Default-unbounded serve mailboxes (medium). Erlang-compatible, but a fast peer flooding a slow WS handler is an OOM vector; bound the serve path's mailboxes by default.
- Plaintext serve (known).
http:///ws://only today; serving TLS lands here. - Cluster peer-trust (medium). mTLS authenticates the connection; a malicious authenticated peer can still advertise false
name→nodeownership in the gossiped global registry (silent misrouting). Sign ownership; and replacelock().unwrap()in the control loop with poison-resistant locking.
These are the items to land before exposing rusm serve to untrusted traffic.
- Nineteen live dashboard benchmarks — every scenario now runs on real data: the ten core engines (spawn-storm, ping-pong, fault-recovery, connection-storm, connection-scale, fairness, module-storm, component-storm, stream-pipe, distributed-fanout), six co-resident serving demos (
http-throughput,ws-echo,sse-fanoutand their*-tstwins), and three platform-primitive scenarios (kv-stormdurable read-modify-writes over redb,pubsub-fanout1→N broadcast,crypto-opscrypto.subtlefrom a TS guest) + the standalonecluster_fanoutbenchmark. The fair, credible serving headline numbers are still measured byrusm-loadtest(out-of-process, vs a liverusm serveport). - TDD throughout; coverage ≥98% (mostly 100%);
cargo fmt+ Prettier clean.
See the per-phase deep dives under phases/, and the RUSM vs Lunatic comparison for the per-phase "borrow vs beat" efficiency playbook we update as each gap closes.