Rust Intern Saved TikTok $300K by Rewriting a Go Service
Every few months, the tech world rediscovers a truth that’s as old as the mainframe: computers are fast, but cloud bills are faster.
Case in point — TikTok reportedly saved $300,000 a year after an intern rewrote part of a Go microservice in Rust. Not a full rewrite. Not a heroic all-nighter by a crack performance team. Just an intern and a partial refactor. The results read like a benchmarking fairy tale:
- CPU usage dropped from 78% to 52%,
- Memory usage fell from 7.4% to 2.07%,
- p99 latency went from 19.87ms to 4.79ms,
- and the service handled twice the traffic.
That’s not “slightly better.” That’s “your SRE just high-fived your CFO” better.
The Myth of “Optimization Is Pointless” There’s a certain tech cliché that goes: “Computers are fast, and developer time is expensive.” It’s the motto of every startup sprinting toward product–market fit, every engineering manager terrified of scope creep, and every tech lead who’s watched one too many heroic optimizations turn into spaghetti. And to be fair — that’s true, at least for a while. At small scale, optimization often doesn’t pay back the engineering hours it consumes. You can spend a week squeezing microseconds out of your backend, or you can ship a new feature that brings in users (and revenue). One of those keeps the lights on. But the key word here is scale. As a company grows, the math flips. You move from a world where servers are cheaper than developers, to one where developers are cheaper than servers — and then the real fun begins.
The Three Stages of Scale (and Sanity)
Stage One: The Startup Illusion — “Move Fast, Forget Cache Misses.” Server bills are pocket change. The app fits on one EC2 instance, and you spend more on coffee than compute. At this point, optimizing is performative — you’re better off building features and praying someone uses them.
Stage Two: The Awkward Middle — “It’s Expensive, But So Is Thinking About It.” You’re growing, but not enough to justify a full-time performance engineer. Every optimization project competes with a new revenue feature. This is the gray zone — where your infra team cries quietly at 1 a.m. watching Grafana dashboards, but management still says “just autoscale it.”
Stage Three: The Scale Reality — “We Burn Money by the CPU Cycle.” Suddenly, your cloud bill has commas. Lots of them. Every percentage point of CPU matters. Optimization isn’t premature anymore — it’s a financial instrument.
Why Rust Won This Round Let’s get one thing straight: Go didn’t fail here. Go was designed for developer velocity, not for shaving nanoseconds. It’s a pragmatic, opinionated language that trades raw performance for simplicity, concurrency, and fast iteration. For most workloads, it’s more than fast enough. But when you’re CPU-bound — say, processing media, handling tight loops, or running algorithmic logic that can’t just be parallelized away — Go’s garbage collector and memory management start to show their cost. Allocations add up, GC pauses creep in, and your latency chart starts to look like a seismograph. Rust, meanwhile, brings the kind of performance you can set your watch to. Its ownership model enforces predictability — no GC, no runtime, no surprises. You pay for what you use, and not a byte more. It’s like writing C++, but without the constant fear that you’ve summoned a segfault demon. And while rewriting everything in Rust is usually a bad idea (your team will mutiny), using it surgically — on the hot paths that burn your CPU budget — can yield ridiculous returns. Which is exactly what happened here.
When Code Meets the Cloud Bill I can write Fibonacci iteratively in O(n) time instead of O(2ⁿ), but this is just for explaining this case well. func fib(n int) int {
if n <= 1 {
return n
}
return fib(n-1) + fib(n-2)
}
func handler(w http.ResponseWriter, r *http.Request) {
nStr := r.URL.Query().Get("n")
n, _ := strconv.Atoi(nStr)
result := fib(n)
fmt.Fprintf(w, "%d\n", result)
}
func main() {
http.HandleFunc("/fib", handler)
http.ListenAndServe(":8080", nil)
}
To understand why Rust made such a difference, imagine a simple CPU-bound endpoint — say, a /fib?n=40 API that calculates the 40th Fibonacci number recursively. It’s intentionally inefficient, the kind of function that eats CPU for breakfast. In Go, this is a dozen easy lines: you spin up a net/http handler, call fib(n-1) + fib(n-2), and return the result. It’s clean, idiomatic, and delightfully boring — until you realize that each request spawns a goroutine that keeps a tiny stack, a handful of heap allocations, and a garbage collector quietly watching over it all. Multiply that by thousands of concurrent requests, and your CPU starts wheezing like an overworked barista. Now write the same thing in Rust. The code looks nearly identical — just wrapped in Actix-Web and using match instead of if. But under the hood, it’s an entirely different creature. Rust compiles down to pure machine code with zero runtime baggage. There’s no scheduler juggling goroutines, no hidden garbage collector, no write barriers, no safepoints — just predictable, tight loops optimized by LLVM. When you disassemble the binaries, Go’s version calls into its runtime for stack checks and GC barriers, while Rust’s emits nothing but add, mov, and ret. The difference isn’t aesthetic; it’s mechanical. Go trades a few CPU cycles for simplicity. Rust refuses to. And at TikTok scale, those “few cycles” become entire data centers. That’s why the TikTok intern’s rewrite worked. It wasn’t because Rust is magically faster — it’s because Rust stays out of the way. Go’s runtime gives you safety and ease of development; Rust gives you control and determinism. Both philosophies are right, but they shine at different scales. At startup speed, Go wins. At hyperscale, physics wins.
The Real Lesson: Profiling Beats Pontificating It’s easy to fall into architectural dogma. “Horizontal scaling solves everything.” “Just increase the instance size.” “Microservices are easier to optimize individually.” These are comforting mantras — until you realize you’ve scaled your inefficiency. Optimization isn’t about guessing where the bottlenecks are. It’s about profiling. The TikTok intern didn’t just rewrite Go code in Rust for fun; they measured first. They found a CPU-bound service, tested alternatives, and validated gains. That’s the difference between “engineering” and “cargo culting performance.” Every organization hits this inflection point. Some notice it early and evolve. Others sleepwalk into a cloud bill that looks like a national defense budget.
The Hidden Cost of “Good Enough” There’s a strange cultural allergy in tech to talking about performance work — as if optimizing code is somehow less “strategic” than building new features. But inefficiency compounds silently. Every millisecond in your request path multiplies across millions of users. Every extra MB of memory eats into your node density. At scale, “good enough” becomes “financially reckless.” And ironically, fixing it doesn’t always require a team of senior engineers. Sometimes, it’s an intern with curiosity, a profiler, and the freedom to poke around in the right place.
Closing Thoughts The TikTok story isn’t really about Go vs. Rust. It’s about knowing when to care. In the early days, optimize for developer velocity. Later, optimize for efficiency. And when the scale tips — when servers become your biggest line item — make sure someone’s looking at the CPU charts before your CFO does. Because one day, you’ll realize the most cost-effective optimization you ever made wasn’t a clever algorithm — it was letting an intern rewrite the right 500 lines of code. And that’s the punchline: performance doesn’t always need a hero. Sometimes, it just needs permission.
Notes: When I say Rust emits nothing but add, mov, and ret, that’s a shorthand for “this is what the CPU actually sees.” These are machine instructions — the building blocks of every program once it’s compiled. mov means move data — copy a value from memory into a CPU register or from one register to another. add means add two numbers. ret means return from a function. That’s it — just arithmetic and data movement, no extra bookkeeping. It’s the purest, lowest-level form of execution you can get: the CPU crunching numbers with zero interference. By contrast, when Go compiles the same function, the CPU still runs similar instructions, but with extra steps inserted by Go’s runtime — like stack checks to prevent overflows and calls to its garbage collector to track memory. Those are the “hidden helpers” that make Go easy to use, but they also burn CPU cycles.