Jump to content

Our Rust Rewrite Improved Performance 12X But Killed Team Velocity by 65%

From JOHNWICK

The pull request had 47 approving comments. “This is beautiful code,” someone wrote. “Poetry in motion,” said another. Our API response times dropped from 340ms to 28ms. Memory footprint? Down 80%. I should’ve been celebrating.

Instead, I was staring at my team’s Jira board. Zero story points completed in three weeks. Our best senior engineer had just scheduled a “career conversation” with me. And our CEO kept asking why features that used to take two weeks were now taking two months. We’d won the technical battle and lost the war that actually mattered.

When “Better” Becomes the Enemy of “Good Enough” Here’s the thing — I championed this Rust rewrite. Pushed for it in architecture reviews. Sold it to leadership with fancy charts showing performance gains.

It was June 2023. Our Node.js service was a mess — 83,000 lines of JavaScript that had grown organically (read: chaotically) over four years. Memory leaks every other week. Response times creeping up. The kind of codebase where you hold your breath during deploys. “We need a fresh start,” I announced in our quarterly planning meeting.

Everyone nodded enthusiastically. Because developers love rewrites the way kids love snow days. Clean slate! Modern patterns! No technical debt! But nobody — and I mean nobody — asked the question that would’ve saved us months of pain: “What problem are we actually solving?”

Not “Is our code ugly?” (it was). Not “Could we make it faster?” (we could). But “What’s preventing us from delivering value to users?” Spoiler alert: It wasn’t our choice of programming language.

The Numbers Looked Amazing (On Paper) Let me show you what got me so excited:

Old Node.js service:

  • 340ms average response time
  • 2.1GB memory usage under load
  • 47% CPU utilization at peak
  • 3–4 production incidents per month

New Rust service:

  • 28ms average response time (12.1x faster!)
  • 420MB memory usage (80% reduction!)
  • 8% CPU utilization (insane!)
  • Zero memory-related crashes

I mean, look at those numbers. Who wouldn’t want that? But here’s what I conveniently ignored in my spreadsheet:

Team metrics that actually mattered:

  • Time to ship new features: Up 185%
  • Sprint velocity: Down 65%
  • Pull request cycle time: Up 320%
  • “I feel productive” survey score: Down from 8.2 to 4.1

And honestly? I didn’t even track those numbers until it was too late. I was too busy admiring our beautiful benchmarks. Your system’s performance matters way less than your team’s performance. Optimize for the wrong one and you’ll pay for it.


1. The Learning Curve Wasn’t a Curve — It Was a Cliff “Rust is easy to learn!” said no one who’s actually learned Rust. Look, I get it. I drank the Kool-Aid. I read the Rust Book cover to cover. Solved 100 Exercism problems. Built a little CLI tool. Felt like a genius when the borrow checker finally shut up and let my code compile.

But here’s what shocked me: My experience learning Rust had nothing to do with my team’s experience. I had:

  • Three months of evenings and weekends to learn
  • Zero pressure to ship features
  • The luxury of getting stuck on concepts without consequences

My team had:

  • Sprint deadlines every two weeks
  • Angry product managers asking “Where’s the feature?”
  • Teammates depending on them to unblock work
  • That sinking feeling of “Am I too dumb for this?”

Week four of the rewrite, Rachel (senior engineer, 8 years experience) came into my office. “Can we talk?”

She showed me her screen. A function to parse JSON and validate some fields. In Node.js, she could write it in 20 minutes. In Rust, she’d spent two days fighting with Result<T, E>, lifetime annotations, and error trait implementations.

“I feel incompetent,” she said quietly. “Like I forgot how to program.” Real talk: That moment still haunts me.

2. “Just Use the Documentation” Is the Worst Advice Ever You know what’s worse than a steep learning curve? A steep learning curve with fragmented documentation.

Our Node.js service used Express, Sequelize, Axios, and about 30 other well-established libraries. Stack Overflow was full of answers. Copy-paste-modify-done. Not elegant, but effective.

Rust’s ecosystem in 2023? A minefield of half-finished crates and “This might work, YMMV” documentation. We needed to implement OAuth2. In Node.js, I’d used passport with 15 lines of code. In Rust, we had four different OAuth2 crates, each with different tradeoffs:

  • oauth2 (most popular, but async story was messy)
  • openidconnect (feature-rich, but 50-page documentation)
  • yup-oauth2 (Google-specific, didn't fit our needs)
  • Roll our own (LOL, no)

We spent three weeks evaluating options. Three weeks! For a solved problem! And that was just OAuth2. Now multiply that by every single feature: database connections, HTTP clients, JSON serialization, logging, metrics, caching, rate limiting… Each decision required deep research, PoCs, architectural debates. In Node.js, we’d just install the obvious package and move on.

I felt like an idiot when I finally calculated: We spent 40% of our time researching libraries and arguing about crate choices. Only 60% actually writing features.

3. The Two-Speed Team Problem Killed Us By month three, our team had split into two groups:

The Rust Experts (2 people):

  • Could write idiomatic Rust
  • Understood lifetimes, traits, async runtime internals
  • Became bottlenecks for every code review
  • Worked 50–60 hour weeks trying to help everyone else

Everyone Else (6 people):

  • Could write Rust that compiled (eventually)
  • Copied patterns without understanding why
  • Waited hours or days for code review feedback
  • Felt increasingly frustrated and demoralized

Tom and Jessica (our Rust experts) were drowning. Every PR needed their review. Every design decision needed their input. Every bug needed their expertise to debug. “Can you look at my iterator chain?” became the most common Slack message.

And honestly? Tom and Jessica started resenting the rest of the team. Not maliciously — they were just exhausted and stressed. But you could feel the tension building. Meanwhile, our mid-level and junior engineers felt like second-class citizens. They’d gone from confidently shipping features to constantly feeling blocked and uncertain. The two-speed team wasn’t just a productivity problem. It was an org culture disaster waiting to happen.

4. Fast Code Doesn’t Matter When You Can’t Ship Code Let me tell you why our 12x performance improvement didn’t matter: Our traffic was 2,500 requests per minute at peak. The old Node.js service handled that at 340ms per request without breaking a sweat. Yeah, CPU was high, but we were nowhere near limits. We weren’t Netflix or Google. We were a mid-sized SaaS company with 40,000 users. But here’s what DID matter:

Feature #1: Add webhook retry logic

  • Node.js estimate: 3 days
  • Actual Rust time: 18 days
  • Why: Needed to implement custom retry strategies with exponential backoff using tokio, handle async state management, write comprehensive error types

Feature #2: Export data to CSV

  • Node.js estimate: 2 days
  • Actual Rust time: 12 days
  • Why: CSV serialization required implementing custom traits, handling streaming for large datasets, debugging lifetime issues with iterators

Feature #3: Integrate new payment provider

  • Node.js estimate: 5 days
  • Actual Rust time: 28 days
  • Why: Payment provider SDK was Python/Node.js only, had to reverse-engineer their API, implement OAuth2 flow from scratch, handle webhook signatures

Our CTO pulled me aside after the third delayed sprint. “The API is lightning fast,” he said. “But we’re shipping features at a snail’s pace. What’s going on?” What’s going on was that we’d optimized for execution speed when we should’ve optimized for development speed.

Nobody cares if your API responds in 28ms instead of 340ms. But everyone cares when the feature they’ve been waiting for is eight weeks late.

5. The Hiring Pipeline Dried Up Completely September 2023. We needed to hire two more backend engineers. Growth was good (ironically, despite our velocity problems). I was excited to bring in fresh talent.

Our job posting:

  • “Strong Rust experience required”
  • “2+ years production Rust preferred”
  • “Experience with async Rust and tokio”

Applications received in 30 days: 12 Compare that to our Node.js posting from a year earlier: 340 applications in two weeks. And of those 12 Rust candidates? Three were qualified. One wanted $220k (50% above our range). Two accepted other offers before we could extend.

We were stuck. I tried pivoting: “What if we hire strong engineers and teach them Rust?”

Our Rust experts nearly revolted. “We’re already spending half our time mentoring the current team. We can’t handle more junior Rust developers. We’ll never ship anything.” So we were caught in this impossible situation:

  • Can’t hire experienced Rust developers (don’t exist in sufficient numbers)
  • Can’t hire and train new Rust developers (no bandwidth)
  • Can’t switch back to Node.js (sunk cost, ego, embarrassment)

The hiring pipeline problem exposed the fundamental flaw in our decision: We’d picked a technology that locked us into a talent pool the size of a puddle.


6. Technical Debt 2.0: Now With More Panics! You know what’s funny? (In a dark humor way) We rewrote our service to eliminate technical debt. And we created a whole new category of technical debt. Old Node.js technical debt:

  • Inconsistent error handling
  • Missing validation in some endpoints
  • Tests that were more hope than assertion
  • Callback hell in a few older files

New Rust technical debt:

  • Liberal use of .unwrap() and .expect() everywhere (because proper error handling took too long)
  • Copy-pasted patterns that nobody understood
  • Arc<Mutex<>> everywhere (because we didn’t understand Send and Sync)
  • Zero documentation (because we were always behind schedule)
  • Tests that were even worse (async testing is HARD)

Turns out, when you’re under pressure to ship and you don’t fully understand the language, you write bad code. Shocking, right?

But here’s the kicker: The Node.js technical debt was fixable. We could refactor incrementally. Any developer could jump in and clean things up. The Rust technical debt? Only Tom and Jessica could fix it. And they were too busy trying to ship features.

We’d traded widely-distributed, fixable debt for highly-concentrated, expert-only debt. Technical debt isn’t about how “messy” your code looks. It’s about how easily your team can change it. We’d made our codebase immaculate and unchangeable.

7. The Breaking Point: When Your Best People Leave November 2023. Rachel put in her notice. “I’m going to a company that uses Go,” she said in her exit interview. “I’m tired of feeling incompetent. I want to go somewhere I can be productive again.”

Two weeks later, Marcus (mid-level engineer, super promising) also resigned. “I’m just not learning fast enough,” he said. “Everyone here is either a Rust expert or struggling like me. I need an environment where I can grow without feeling like I’m drowning.” And then — this one hurt — Jessica, one of our Rust experts, announced she was leaving too. “I’m burned out,” she said. “I can’t keep being the bottleneck for eight other people. It’s not sustainable.”

Three engineers in one month. 30% of our backend team. Recruiting costs. Knowledge loss. Morale crater. Projects delayed even further. I’d spent nine months building the “perfect” technical system. And in the process, I’d broken the human system that actually mattered.

The final gut-punch came in Jessica’s exit interview. She said: “You know what’s sad? The Node.js system was fine. Ugly, but fine. We could’ve just… improved it. Refactored slowly. Extracted some services. But instead we threw it all away chasing perfection.” She was right. And I knew it.

What I Would Do Differently (The Painful Lessons) I spent three months in therapy after that quarter. Not kidding. My therapist helped me process the guilt of having hurt my team while trying to help them. Here’s what I learned:

Lesson 1: Start With the Problem, Not the Solution Before: “Our code is messy, let’s rewrite in Rust!” After: “Why are we shipping slowly? Let’s diagnose that.” Turns out, our real problems were:

  • Poor module boundaries (fixable in Node.js)
  • Insufficient testing (not a language problem)
  • Unclear requirements (not a language problem)
  • Too much work-in-progress (not a language problem)

We could’ve solved 80% of our issues with better practices in our existing stack.

Lesson 2: Optimize for Team Velocity, Not Execution Velocity Your API running at 28ms instead of 340ms is invisible to users. Your features shipping 10 weeks late instead of 2 weeks late? That’s very visible. Ask yourself: “What’s the actual constraint on our business?” Nine times out of ten, it’s not CPU cycles or memory.

Lesson 3: Respect the Learning Curve Rust isn’t “hard to learn.” It’s time-consuming to learn. There’s a difference. If I’d been honest about the 6–12 month investment required per engineer, and multiplied that by the opportunity cost of delayed features, the business case would’ve fallen apart immediately.

Lesson 4: Incremental Beats Revolutionary Instead of a full rewrite, we could’ve:

  • Extracted the hottest 5% of code paths
  • Rewrote just those in Rust
  • Called them from Node.js via FFI or microservices
  • Measured actual business impact
  • Decided what to do next based on data

Boring? Yes. Effective? Also yes.

Lesson 5: Technology Choices Are People Decisions Every tech decision is actually asking:

  • Can we hire for this?
  • Can we train people on this?
  • Can our team maintain this?
  • Will people want to work on this?

I ignored all four questions. Paid the price.


The Redemption Arc (Sort Of) January 2024. New year, time to fix this mess. I proposed something radical to leadership: Rewrite the Rust service back to TypeScript (we’d upgraded from Node.js). The silence on the Zoom call lasted about 30 seconds. Finally, our CTO laughed. Not a mean laugh — a relieved laugh. “Thank god,” he said. “I thought I was going to have to fire you to make that happen.” Here’s what we did:

Phase 1 (2 months):

  • Rebuilt the API layer in TypeScript with better architecture
  • Kept the Rust performance-critical components as microservices
  • Maintained the 12x performance improvement on the parts that mattered
  • Accepted the 340ms response times on parts that didn’t

Phase 2 (3 months):

  • Team velocity recovered to 85% of pre-Rust levels
  • Hired 4 new engineers easily
  • Shipped the year’s worth of backed-up features
  • Team satisfaction scores climbed back to 7.8

Phase 3 (ongoing):

  • Kept Rust for: image processing pipeline, data aggregation jobs, ML inference
  • Used TypeScript for: everything user-facing, everything that changes frequently
  • Everyone’s happy(ier)

Did we “lose” the Rust rewrite? Kind of. But we won back our team’s productivity, morale, and ability to hire. And you know what? Our users never noticed. They were just happy we were shipping features again.

The Truth Nobody Wants to Hear The best technical decision isn’t the one with the best benchmarks. It’s not the one that looks best on your resume. It’s not the one that makes you feel smart. It’s the one that helps your team ship value to users consistently and sustainably. Sometimes that’s Rust. Sometimes that’s TypeScript. Sometimes it’s boring old PHP or Java. I spent nine months learning this lesson the expensive way — in lost productivity, lost talent, and lost trust. I’m sharing it so maybe you don’t have to. The question isn’t “What’s technically superior?” The question is “What will make my team most effective?” And honestly? Once I started asking that question, so many decisions became clearer. What’s your team optimizing for — clean code or clean delivery? Because I learned you can’t always have both, and choosing wrong is brutal.