What Prime Video Taught Me About Rewriting a Massive App in Rust
I just wrapped an interview with Alexandru from Prime Video about their multi-year journey to rebuild a production UI stack in Rust + WebAssembly (Wasm) — and wow, it’s a treasure trove of lessons for anyone shipping high-performance apps at scale. This isn’t your typical “we switched to Rust and it got faster” story. It’s about architecture choices that compound, about panic-free engineering in Wasm, and about the surprisingly pragmatic decision to port bugs on purpose. Here’s what stuck with me.
A React-like API on Top of an ECS: The Best of Both Worlds Prime Video’s Rust UI engine is built on an Entity Component System (ECS) — think Bevy’s model — while exposing a React-like API to product teams.
- Under the hood: UI elements are entities; behavior and data are components; logic like focus management, input handling, and layout runs in systems that query entities with specific component sets.
- On top: Developers build views with a React-ish, declarative API, not a game engine. That means fewer new mental models to adopt.
Why this matters: systems like focus management can query exactly the entities they care about (e.g., Focusable + Visible + OnScreen) and respond in a single tick, which helps them hit ~30ms input latency end-to-end. On a TV remote, that difference is the line between “snappy” and “laggy.” ECS gives you cache-friendly data and precise queries; the React-like layer gives teams a familiar way to compose UI. You get predictable performance without sacrificing developer ergonomics. A tiny taste of the idea (illustrative pseudocode): // System: update focus based on directional input fn focus_system(query: Query<(Entity, &Focusable, &Rect), With<Visible>>,
input: Res<DirectionalInput>,
mut active: ResMut<ActiveFocus>) {
if let Some(dir) = input.just_pressed() {
if let Some(next) = find_next_focus(&query, active.current, dir) {
active.current = next;
}
}
} The React-like layer simply declares what is on screen; the ECS decides how it reacts.
Wasm’s Harsh Truth: Panics Are Fatal If you’ve ever leaned on panic!() safety rails in Rust, Wasm will humble you. In Prime Video’s environment:
- Panics in production WebAssembly are unrecoverable. There’s no stack unwinding — a panic can take the whole app down, instantly.
- Array bounds violations? Third-party dependency panics? Same story. A single unchecked assumption becomes a user-visible crash with no recovery path.
Prime Video ended up patching third-party crates that panicked on “incorrect” inputs so they’d return Result instead of aborting. That’s not a one-time audit; it’s posture: assume every dependency might panic. Cultural shift: “panic-free” isn’t just a code guideline — it’s a supply chain requirement. You’re not done when your code is panic-free; you’re done when your entire dependency tree is panic-free. Practical patterns they leaned on:
- Replace panic!/unwrap/expect with total functions and Result-returning APIs.
- Wrap unsafe or third-party edges with defensive adapters that convert panics into errors before they cross the FFI/Wasm boundary.
- Fuzz and property-test the hot paths and parsers that handle untrusted or “creative” inputs.
fn parse_config(bytes: &[u8]) -> Result<Config, ParseError> {
// never unwrap, even if "impossible"
let v = serde_json::from_slice::<Config>(bytes)
.map_err(ParseError::InvalidJson)?;
validate(v)
} This is less about fear and more about respecting the platform guarantees. In Wasm, predictability beats cleverness every time.
The Counterintuitive Move: Port the Bugs First One of the most interesting decisions: while porting to Rust, the team copied existing behavior exactly — including known bugs — rather than “fixing as they go.” Why on earth would you do that? Because in a large codebase, other code often evolves to depend on those bugs. Workarounds, timeouts, ordering assumptions, quirky edge cases — rip any of those out mid-rewrite and your blast radius explodes. Prime Video explicitly separated the project into two phases:
- Rewrite: achieve behavioral parity (“bug-for-bug compatible”).
- Refactor: fix and improve once parity and stability are achieved.
That separation shrinks timeline risk and localizes change. You’re measuring the rewrite by equivalence, not by “better,” and you save “better” for a safer, post-migration window — when the org has headspace, telemetry, and confidence. Rewrites fail when they mix porting with improving. Prime Video treated “improve” as a follow-up project, not a second job in the same sprint.
Async Without the Chaos: Callbacks First, Async as a Wrapper Rust’s async can go viral. Once you dip a toe, you’re passing async around like a cold—up the stack, across layers, into places you didn’t plan. Prime Video started with no async at all, choosing callbacks for asynchrony to keep the core UI strictly synchronous and easy to reason about. Only later did they introduce async as a wrapper layer — for deferred components and network calls — that produces synchronous UI state when awaited tasks complete. In other words: async happens at the edges; the heart stays sync. Why this worked:
- Predictability: The render/update loop stays deterministic.
- Isolation: Async doesn’t infect every function signature.
- Debuggability: Fewer lifetimes/futures in hot paths.
Sketch of the flow (simplified): // Core UI loop remains sync fn render(state: &UiState) -> View {
// normal, synchronous diff/patch work
} // Async wrapper at the boundary async fn load_and_commit(id: ContentId, tx: UpdateTx) {
if let Ok(data) = fetch_content(id).await {
// Push a synchronous update back into the ECS/UI world
tx.send(UiUpdate::ContentReady { id, data });
} else {
tx.send(UiUpdate::ContentError { id });
}
} The result: no async virality, fewer footguns, and a clean mental model for teams shipping UI.
Design Principles That Emerged Talking through the trenches with Alexandru, a handful of principles kept surfacing:
- Choose an internal model that matches the work. Input, focus, layout, and animation are systems problems — they thrive in ECS. Let developers author declarative views; let systems run the machine.
- Engineer for the platform’s failure mode. In Wasm, that means zero-panic tolerance. Convert panics to Results, control your dependency boundary, test like your app depends on it (because it does).
- Split rewrite from refactor. Behavioral parity first (including bugs), then fix. It’s the calmest path through a jungle of unknown unknowns.
- Keep the core synchronous. Use async at the edges to pull data in and hydrate state. Don’t let async reshape your entire architecture unless you must.
- Measure what users feel. Hitting ~30ms input latency isn’t just a perf win; it’s a product win. Users notice “instant.”
Anti-Patterns to Avoid
- The “while we’re here” trap. Rewriting and redesigning the feature set and refactoring is how schedules slip and trust erodes.
- Assuming third-party crates share your tolerance for panics. They rarely do by default; audit and adapt.
- Leaking async everywhere. It increases cognitive load, lifetime spaghetti, and scheduling unpredictability.
- Treating UI as trees only. Trees are great for declaration. The behavior that animates them often looks more like systems.
If You’re Considering a Similar Rewrite Here’s a practical starter checklist I wish every team had: Architecture Pick a data-oriented core (ECS or similar) for tight loops like input, focus, and layout. Put a friendly declarative layer on top to shield most teams from the engine internals. Reliability Enforce a no-panic policy, including in dependencies; wrap or fork where necessary. Invest in fuzzing and property tests for parsers, layout math, and serialization. Migration Strategy Lock on behavioral parity as the Phase 1 definition of done. Keep a compatibility test suite that compares old vs. new render/output for the same inputs (golden tests are your friend). Async Strategy Start synchronous by default; introduce async only at boundaries. Use message passing (channels/queues) to deliver async results back into the sync world. Performance Instrument input → response latency; optimize systems that sit on that path. Prefer query-driven systems over “notify everything” broadcasts.
The Bigger Picture What impressed me most wasn’t just the technical choices — it was the discipline behind them. Prime Video treated Rust not as a silver bullet, but as a force multiplier for clarity:
- ECS made state and behavior queryable and testable.
- A React-like surface preserved developer velocity.
- Panic-free policies honored Wasm’s constraints.
- Bug-for-bug parity respected organizational reality.
- Async at the edges protected architectural integrity.
That combination is how you get an app that feels instant, stays up, and keeps shipping.
If you’re wrestling with similar questions — UI performance, Wasm targets, or how to survive a rewrite without losing your mind — this playbook from Prime Video is worth stealing. Big thanks to Alexandru for the deep dive.
Read the full article here: https://medium.com/@trivajay259/what-prime-video-taught-me-about-rewriting-a-massive-app-in-rust-7494f86b6173