Fearless Concurrency Bit Back: 7 Rust Rules That Stopped the Pager Storm
The pager went off at 2:47 AM. Again. Production was down. Threads were deadlocked. The same threads we had rewritten in Rust specifically to avoid this nightmare.
“Fearless concurrency,” the Rust book promised. We believed it. We migrated 40,000 lines of Go to Rust because the borrow checker would save us from ourselves. Six months of work. Three senior engineers. Management bought in. Then we shipped, and the system locked up harder than it ever did in Go. The Lie We Told Ourselves Rust does not make concurrency easy. It makes data races impossible. Those are not the same thing. You can still deadlock. You can still starve threads. You can still write concurrent code that compiles perfectly and fails spectacularly under load. The borrow checker caught our memory bugs. It did not catch our logic bugs. And in concurrent systems, logic bugs are what kill you at 2 AM. Rule 1: Arc Does Not Mean Thread-Safe Logic Our first mistake was wrapping everything in Arc<Mutex<T>> and assuming we were done. struct Cache {
data: Arc<Mutex<HashMap<String, Vec<u8>>>>
}
impl Cache {
fn get(&self, key: &str) -> Option<Vec<u8>> {
let map = self.data.lock().unwrap();
map.get(key).cloned()
}
fn update(&self, key: String, val: Vec<u8>) {
let mut map = self.data.lock().unwrap();
map.insert(key, val);
}
} Clean. Compiles. Deadlocks under load. The problem? We had twelve of these caches. Code would lock cache A, then try to lock cache B. Meanwhile, another thread locked B and wanted A. Classic deadlock. Rust let it happen because the locks were acquired correctly from a memory safety perspective. The fix was ordering: // Always lock in same order fn sync_caches(&self) {
let a = self.cache_a.lock().unwrap(); let b = self.cache_b.lock().unwrap(); // Always A then B // do work
} Boring. Obvious in hindsight. Took us four hours of production downtime to figure out. Rule 2: Channels Are Not Free We went channel-crazy. Every operation communicated through channels because that felt like “the Rust way.” let (tx, rx) = mpsc::channel();
for item in items {
let tx = tx.clone();
tokio::spawn(async move {
let result = process(item).await;
tx.send(result).unwrap();
});
} At 1,000 requests per second, this worked great. At 50,000 requests per second, we had 200,000 tasks spawned, each with its own channel sender, all trying to send to one receiver. Memory usage spiked to 14GB. The receiver could not keep up. Tasks blocked on send. The whole system ground to a halt. We switched to bounded channels with proper backpressure: let (tx, rx) = mpsc::channel(1000); // Bounded
tokio::spawn(async move {
match tx.send(result).await {
Ok(_) => {},
Err(_) => { /* receiver dropped, stop working */ }
}
}); Response times dropped from 8 seconds to 340ms. Rule 3: Async Does Not Mean Concurrent This one hurt. We thought spawning tasks meant parallelism. async fn handle_request(id: u64) -> Result {
let user = db.get_user(id).await?;
let prefs = db.get_prefs(id).await?;
let history = db.get_history(id).await?;
Ok(Data { user, prefs, history })
} Sequential. Every await blocks the next one. Three database calls that could run in parallel were running one after another. The fix: async fn handle_request(id: u64) -> Result {
let (user, prefs, history) = tokio::join!(
db.get_user(id),
db.get_prefs(id),
db.get_history(id)
);
Ok(Data { user?, prefs?, history? })
} Request latency went from 180ms to 65ms. Same code, different execution model. Rule 4: RwLock Is Not Always Better We replaced every Mutex with RwLock because “readers do not block each other.” Performance got worse. Turns out RwLock has overhead. For short critical sections with high contention, Mutex is faster. RwLock shines when reads are long and writes are rare. Our reads were 12 nanoseconds. The RwLock bookkeeping cost more than just using Mutex. Benchmark before you optimize. Rule 5: Parking Lot Is Your Friend Standard library Mutex is fine until it is not. Under heavy contention, we saw performance collapse. We switched to parking_lot: use parking_lot::Mutex;
// Same API, better performance let m = Mutex::new(data); No code changes. Throughput increased by 34%. The parking lot implementation uses better spinning and parking strategies than stdlib. Rule 6: Thread Count Matters More Than You Think We set Tokio to use all available cores. On our 32-core production boxes, that meant 32 worker threads. CPU usage was at 60% but throughput was terrible. Threads were fighting over locks, context switching constantly. We tuned it down:
- [tokio::main(worker_threads = 8)]
async fn main() {
// run server
} Eight threads saturated at 100% CPU, but total throughput doubled. Less contention, less context switching, better cache locality. More threads is not always better. Find your sweet spot through testing. Rule 7: Measure Everything This is the rule that actually mattered. We added instrumentation everywhere: let start = Instant::now(); let result = critical_section(); metrics.record_duration("critical_section", start.elapsed()); We tracked:
- Lock contention (how often threads waited)
- Channel queue depths
- Task spawn rates
- Memory per connection
The data showed us patterns we never would have guessed. Tuesday afternoons had 10x the lock contention of other times. One endpoint spawned 400 tasks per request while others spawned 4. Without metrics, we were debugging blind. What Actually Changed After implementing these rules, our system went from 14 production incidents per week to 2 per month. Response times stabilized. P99 latency dropped from 4.2 seconds to 380ms. Memory usage became predictable. The pager stopped waking us up. Rust did not magically fix our concurrency problems. It gave us tools that prevented certain classes of bugs while letting us create entirely new ones. The borrow checker is amazing. It stops you from shooting yourself in the foot with memory. But it will not stop you from shooting yourself in the foot with locks, channels, and thread pools. Fearless concurrency is real, but it requires different fears. Not “will this segfault” but “will this deadlock under load.” Not “is this memory safe” but “is this performant at scale.” We still use Rust. We are still glad we migrated. But we approach concurrency with a lot more respect now. The compiler can only save you from so much. The rest is on you.