7 Boring Rust Sidecar Wins — Cut Memory 40% Without Rewrites

A contrarian playbook for teams who prefer graphs over glory. We had the rewrite itch. You know the one. “Throw it all away, do it ‘right’ in Rust, bask in p99 bliss.” We didn’t. We shipped a Rust sidecar instead — one tight, hot-path service next to our app — and cut memory by 40%, trimmed p95, and lowered GC churn. No heroics. No year-long migration. Just a smaller heap and quieter dashboards. The thesis: full rewrites are strategy claims disguised as refactors. Sidecars change boundaries without changing the company. Pull quote #1 We cut 40% memory not by changing languages — by changing where the work happens. The Rewrite Fantasy vs. The Sidecar Reality Rewrites feel clean. Prod isn’t. A sidecar gives you a surgical seam:

Keep your app’s contracts intact.
Peel off one ugly CPU task.
Deploy the Rust bit next to the app.
Measure. Roll forward or kill it in minutes.

Bold insight: Rewrites spend political capital. Sidecars earn it. Before/After (Realistic Shape, Same Traffic) We lifted the hottest path (normalize → validate → compress ~4 KB payloads) into a Rust sidecar via localhost call. Everything else stayed put. | Metric (Lunch Peak) | Before (Mono) | After (Rust sidecar) | | ------------------------ | ------------: | ---------------------: | | Process RSS (app pod) | 1.9 GB | 1.1 GB | | Process RSS (sidecar) | — | 180 MB | | **Cluster mem for path** | **1.9 GB** | **1.28 GB** (**−32%**) | | App GC pauses (p95) | 42 ms | 18 ms | | Endpoint p95 | 212 ms | 158 ms | | CPU / RPS (app) | baseline | −15% | | Error budget burn | noisy bursts | boring | The graph that mattered to finance: fewer replicas to survive lunch. The graph that mattered to on-call: fewer pause spikes. Why Sidecars Win (When You Pick the Right Jobs) Rust sidecars shine on deterministic, per-request CPU:

Serialization/validation (JSON/MsgPack/protobuf).
Compression/crypto (gzip, brotli, hashing, HMAC).
Media transforms (image resize, thumbnails, waveform math).
Regex-heavy parsing (logs, ETL shards).
Tokenization/scoring (classic IR, lightweight ML post-proc).

Bold insight: If you can explain the work in one sentence and sketch its bounded memory footprint, it probably belongs in a sidecar. Minimal Rust Sidecar A tiny Axum service for “normalize → validate → gzip” on a single payload: use axum::{routing::post, Router, Json}; use serde::{Deserialize, Serialize}; use flate2::{write::GzEncoder, Compression}; use std::io::Write;

[derive(Deserialize)] struct In { text: String }
[derive(Serialize)] struct Out { ok: bool, bytes: usize, gz: Vec<u8> }

async fn normalize(Json(input): Json<In>) -> Json<Out> {

   let clean = input.text.trim().to_lowercase();            // cheap normalize
   if clean.is_empty() { return Json(Out { ok: false, bytes: 0, gz: vec![] }) }
   let mut gz = GzEncoder::new(Vec::new(), Compression::fast());
   gz.write_all(clean.as_bytes()).ok();
   let body = gz.finish().unwrap_or_default();
   Json(Out { ok: true, bytes: clean.len(), gz: body })

}

[tokio::main]

async fn main() {

   axum::Server::bind(&"0.0.0.0:8081".parse().unwrap())
       .serve(Router::new().route("/norm", post(normalize)).into_make_service())
       .await.unwrap();

} The app calls POST /norm on 127.0.0.1:8081. The work—and the heap—stays off your main runtime. The Sidecar Flow (ASCII Map) [Client]

↓

[App: auth, routing, business rules]

  ↓          ↘
 (JSON)       ↘ localhost call
  ↓             [Rust sidecar: normalize/validate/compress]

[App merges] ↗

  ↓         ↗

[Response + metrics] Short hop, big relief. The heap that used to balloon per request now lives elsewhere. 7 Boring Wins You Can Bank

Heap isolation. The GC no longer has to trace temporary, large buffers.
Lower variance. Deterministic alloc/free flattens p95, especially at bursts.
Smaller replicas. Less memory → fewer nodes → simpler autoscaling curves.
Faster cold-starts. App pods start lean; sidecar warms fast.
Rollback in minutes. Toggle a route or env flag; no schema change needed.
Better postmortems. One service = one heap = one set of graphs.
Political safety. You didn’t launch a new language across the fleet — just one box.

Pull quote #2 Sidecars change who owns the problem: platform owns deployment; product owns call sites; nobody gets a 3 a.m. surprise. The “Pick-The-Right-Job” Checklist Use a sidecar if the task is:

CPU-bound per request and stateless.
Has a bounded output (no untrusted megabyte explosions).
Can be shaped into a single call (no chatty back-and-forth).
Idempotent or easy to retry on timeouts.
Inspectable with a few counters and a histogram.

Skip a sidecar if:

Work is DB-bound or blocked on remote I/O.
Task is a long stream better served by a worker.
Payloads are unbounded or need complex sessions.

Integration Notes That Save Pager Time

Transport: localhost HTTP is fine; Unix sockets shave overhead.
Budgets: set per-call deadlines (e.g., p95 ≤ 20 ms) with circuit breakers.
Backpressure: return 429 on overload; app can degrade gracefully.
mTLS: if you must cross a pod boundary, treat it like the public internet.
Schema discipline: lock input/output JSON schemas to prevent “oops” renames.
Feature flags: one per endpoint; roll forward/back without deploys.

Bold insight: If rollback takes minutes, you earned the right to experiment. Observability: What to Graph on Day One

App: calls/sec to sidecar, error rate, timeout rate, added latency.
Sidecar: CPU, RSS, allocs/sec, request histogram, 95/99ths.
Joint: end-to-end p95, queue depth, cost / 1k requests.

Minimum logs: { trace_id, route, in_bytes, out_bytes, millis, status }. Minimum Health: /readyz checks memory watermark + probe failures. Pull quote #3 You can’t defend a sidecar you can’t see. Counters beat confidence. Governance & “New Language” Anxiety This is where rewrites die. Make the platform own the rails once, then reuse forever.

SBOM + signing for the Rust artifact.
SAST in CI, license policy pinned.
Crash capture with a standard hook; core dumps gated.
Golden image for the container with known Rust/C toolchain.
Runbook: “How to debug at 3 a.m.” with 6 commands, not 60.

No surprise: The second sidecar is 10× easier than the first. Trade-offs (Say Them Out Loud)

Serialization overhead exists; budget it (JSON/flatbuffers/protobuf).
Two services to watch instead of one.
Language literacy for on-call (pair with a cheatsheet and drills).
Success breeds ambition — don’t peel off everything. Keep the big brain in your app.

A Tiny “Should We?” Decision Tree Is the path CPU-bound and stateless? ── yes ──► Sidecar candidate

      │                                     │
      no                                    ▼
      ▼                                Can we cap payloads?

Keep in app/worker yes ──► Prototype behind a flag

                                         no  ──► Leave it in-app

A Repeatable Migration Play

Choose a seam. One endpoint. One job.
Spec the contract. JSON schema + latency budget + failure mode.
Ship the sidecar. Same pod, localhost, p50 < 10 ms target.
Flip 5% traffic. Watch joint p95 and memory.
Hold the deletion ceremony. Remove the in-app path if SLO holds for 7 days.
Write the one-pager. Include graphs and the “don’t repeat” lessons.

The Debate You Should Start At Work Stop asking “Should we rewrite in Rust?” Ask “What job, if moved to a Rust sidecar, would buy us the next six months?” My money: normalize/validate/compress on hot JSON paths; hashing/signing; thumbnailing; small vector math. Your list may differ. Your numbers shouldn’t. Arguable conclusion: If you can’t get 20–40% memory back by peeling one CPU job into a Rust sidecar, show the benchmark that proves a full rewrite would do better — and explain how you’ll pay the governance bill. If your team can, post the graphs. I’ll read them. And I’ll steal your playbook.