Jump to content

8 WASM + Rust Techniques for Native-Speed UIs

From JOHNWICK

You click. The UI answers instantly. No jank, no “thinking” spinner. That feeling isn’t luck — it’s a set of choices. If you’re shipping Rust to the browser with WebAssembly, these are the eight techniques that repeatedly turn prototypes into snappy, production-grade UIs.


1) Zero-copy bridges: share views, not bytes Calling Rust from JS is cheap; moving data isn’t. Pass typed views over raw buffers instead of cloning. Pattern

  • Store large data inside the WASM linear memory.
  • In JS, create a Uint8Array view into that memory using its pointer and length.
  • In Rust, expose getters that return pointers/lengths; avoid copying into Vec on the JS side.

// Cargo.toml: wasm-bindgen = "0.2", js-sys = "0.3" use wasm_bindgen::prelude::*;

  1. [wasm_bindgen]

pub struct Frame { ptr: *const u8, len: usize }

  1. [wasm_bindgen]

impl Frame {

   #[wasm_bindgen(getter)]
   pub fn ptr(&self) -> *const u8 { self.ptr }
   #[wasm_bindgen(getter)]
   pub fn len(&self) -> usize { self.len }

} // JS side const mem = new Uint8Array(wasm.memory.buffer); const view = mem.subarray(frame.ptr, frame.ptr + frame.len); // zero-copy view Why it works: You keep the hot path on a single buffer and avoid GC pressure from temporary copies.


2) SIMD for hot loops: enable simd128 Rust’s std::arch::wasm32 lets you write vector ops directly, or you can rely on auto-vectorization (often good, not magical). Recipe

  • Build for wasm32-unknown-unknown with RUSTFLAGS="-C target-feature=+simd128".
  • Use wide operations for pixel transforms, physics, and DSP-ish tasks.
  1. [target_feature(enable = "simd128")]

pub unsafe fn add_rgba(a: v128, b: v128) -> v128 {

   core::arch::wasm32::i8x16_add(a, b)

} Tip: Keep arrays aligned and use SoA (structure-of-arrays) for large numeric loops to help the compiler vectorize.


3) Frame-paced interop: batch events per requestAnimationFrame Crossing the WASM/JS boundary per mousemove is death by a thousand cuts. Accumulate events in JS, consume them once per frame in Rust. // JS const queue = []; canvas.addEventListener('pointermove', e => queue.push([e.clientX, e.clientY]));

function tick(ts) {

 wasm.consume_events(queue);   // one boundary crossing
 queue.length = 0;
 wasm.update_and_draw(ts);
 requestAnimationFrame(tick);

} requestAnimationFrame(tick); Why it works: You maintain a ≤16.7ms budget (60 FPS) and keep work coherent with the browser’s scheduler.


4) OffscreenCanvas + Worker: move paint off the main thread When UI logic must stay responsive, render in a Worker with OffscreenCanvas. Rust runs inside the worker; main thread stays free for input. // main thread const worker = new Worker('renderer.js', { type: 'module' }); const off = canvas.transferControlToOffscreen(); worker.postMessage({ canvas: off }, [off]); // renderer.js (worker) import init, { draw_frame } from './pkg/app.js'; onmessage = async ({ data }) => {

 const { canvas } = data;
 await init();
 const ctx = canvas.getContext('2d');
 function loop(t){ draw_frame(ctx, t); requestAnimationFrame(loop); }
 requestAnimationFrame(loop);

}; Note: Requires cross-origin isolation for some features; still excellent for decoupling.


5) WebGPU for big visuals; fall back to WebGL2 For heavy scenes, skip the CPU. Use wgpu (Rust) targeting the WebGPU backend, which maps to the browser’s GPU API. Minimal sketch (Rust): // Pseudocode-ish: the wgpu setup is verbose; keep it once, reuse everywhere. pub async fn init_gpu(canvas: web_sys::HtmlCanvasElement) -> (wgpu::Device, wgpu::Queue) {

   let instance = wgpu::Instance::default();
   let surface = instance.create_surface_from_canvas(&canvas).unwrap();
   // request adapter/device, configure surface...
   // create pipelines/buffers; draw in your RAF loop
   // return device/queue for later commands
   unimplemented!()

} Why it works: Big charts, image filters, and particle systems move to GPU pipelines; CPU remains free for UI state.


6) Allocators for UI churn: prefer arenas/bump for transient state Frequent short-lived allocations (menus, tooltips, per-frame scratch) punish performance. Use an arena or bump allocator for per-frame scratch, then reset. use bumpalo::Bump;

thread_local! { static SCRATCH: Bump = Bump::new(); }

pub fn layout_frame() {

   SCRATCH.with(|bump| {
       // allocate short-lived structures here
       let v = bump.alloc([0u8; 64]);
       // ...
       bump.reset(); // free all at once at end of frame
   });

} Payoff: Predictable perf, fewer calls into the default allocator, less GC on the JS side.


7) Compute-then-patch: send tiny DOM diffs, not full trees The DOM is slow because it’s stateful. Compute minimal patches in Rust, then apply via web-sys in one go.

  1. [derive(Clone)]

pub enum Patch { SetText(u32, String), SetAttr(u32, String, String) /* ... */ }

pub fn diff(old: &Ui, new: &Ui, out: &mut Vec<Patch>) {

   // add only what changed

} // JS apply function apply(patches) {

 for (const p of patches) {
   switch (p.kind) {
     case 'SetText': nodes[p.id].textContent = p.value; break;
     // ...
   }
 }

} Mindset: Treat the DOM like a device driver; update in bulk, not piecemeal.


8) Build flags & guardrails: stability is a feature You’ll rarely “optimize” your way past bad builds. Set sane defaults.

  • panic = "abort" (no unwinding in hot paths)
  • codegen-units = 1, lto = "fat", opt-level = "z" for size-critical or "s"/3 for speed
  • Enable wasm-opt -O3 --enable-simd post-build if it’s in your toolchain
  • Use bf16/f16 textures with WebGPU where visuals allow it; halves bandwidth
  • Track max_frame_time and update_time and fail CI if frame pacing regresses

Cargo.toml (release profile): [profile.release] opt-level = 3 lto = "fat" codegen-units = 1 panic = "abort"


A tiny example: high-FPS image processing tool Goal: drag a slider, watch a 4K image sharpen in real time.

  • Pixels live in a single Vec<u8> in WASM.
  • A SharedArrayBuffer (when allowed) carries control messages; otherwise, frame-paced queues.
  • The sharpen kernel is SIMD’d (Technique 2).
  • Rendering happens in a Worker on an OffscreenCanvas (Technique 4).
  • UI buttons live on the main thread; we send one patch batch per frame (Technique 7).
  • Build with lto, panic=abort (Technique 8).

Result: 60 FPS on a mid-range laptop, zero main-thread jank, no “fuzzy” UI after resize.


Code nuggets you’ll reuse Calling requestAnimationFrame from Rust once per frame use wasm_bindgen::{prelude::*, JsCast}; use web_sys::window;

pub fn raf(f: &Closure<dyn FnMut(f64)>) {

   window().unwrap().request_animation_frame(f.as_ref().unchecked_ref()).unwrap();

} Storing a persistent JS callback (avoid alloc every frame) thread_local! {

   static RAF: std::cell::RefCell<Option<Closure<dyn FnMut(f64)>>> = Default::default();

} Safe-ish pointer export

  1. [wasm_bindgen]

pub fn buffer_ptr(v: &Vec<u8>) -> *const u8 { v.as_ptr() }

  1. [wasm_bindgen]

pub fn buffer_len(v: &Vec<u8>) -> usize { v.len() } (Expose through a wrapper type in real code; keep lifetimes clear.)


Performance heuristics (pin this)

  • Move less: it’s almost always the copies.
  • Frame-gate work: everything funnels through requestAnimationFrame.
  • Measure: log worst-case frame time, not just averages.
  • Prefer SoA + SIMD: predictable, cache-friendly, vectorizable.
  • One bulk DOM patch: treat JS like an I/O boundary.
  • Workers for paint: main thread = input & accessibility.


A quick story We rebuilt a dashboard heatmap that jittered on modest laptops. The fix wasn’t a single trick; it was a sequence: moved decoding and colorize to Rust with SIMD; rendered via WebGPU; passed the image buffer by view; and applied UI updates once per frame. Same design, same dataset — but the new build stayed at 60 FPS while cutting CPU usage by more than half. Users noticed. They didn’t know why. They just stopped waiting.


Conclusion Rust + WASM can absolutely feel native, but only if you treat the browser like a realtime system: control copies, align with frames, keep the GPU busy, and treat the DOM as a slow device. Start with one or two techniques above — zero-copy views and frame-paced interop are the easiest wins — then layer Workers, SIMD, and WebGPU as your features demand. CTA: Have a stuttery interaction or chart that just won’t hit 60 FPS? Share the hot path and I’ll suggest a plan.


Read the full article here: https://medium.com/@Nexumo_/8-wasm-rust-techniques-for-native-speed-uis-068780964fe5