Network Programming Battle: io uring in Rust vs epoll in Go
Two approaches to high-performance I/O: raw speed versus operational simplicity. Your production constraints determine which architecture wins, not benchmark numbers alone. Our API gateway — handling 2.8 million requests per second across 47 microservices — was bleeding money. Each 1% latency improvement translated to $240K annually in infrastructure savings. The conventional wisdom was clear: migrate from Go’s epoll to Rust’s io_uring for that magical 40% performance boost everyone talks about. Six months and 180K lines of rewritten code later, we learned something the benchmarks never told us. The Myth of Universal Speed The developer community has embraced a seductive narrative: io_uring represents the future of Linux I/O, and anything still using epoll is legacy technology. Blog posts scream about 2x throughput gains. Conference talks showcase stunning benchmark charts. The message is clear — if you care about performance, you migrate to io_uring. Here’s what those benchmarks hide: They measure isolated, perfect-world scenarios that bear little resemblance to production systems where backpressure, error handling, and monitoring infrastructure matter more than raw syscall throughput. Our journey started with a simple test. We built identical echo servers in both stacks and measured them under realistic load patterns — not synthetic benchmarks, but actual recorded traffic from our production environment replayed at scale. The Numbers That Changed Everything Our initial tests confirmed the hype. At steady-state with perfectly sized buffers and no connection churn, io_uring delivered impressive results: Synthetic Benchmark Performance:
- io_uring/Rust: 1.2M requests/sec, 18μs p99 latency
- epoll/Go: 840K requests/sec, 26μs p99 latency
That 43% throughput advantage looked compelling. But then we introduced production realities. Production Simulation (Mixed Traffic Patterns):
- io_uring/Rust: 780K requests/sec, 94μs p99 latency, 12% CPU overhead for error handling
- epoll/Go: 720K requests/sec, 31μs p99 latency, 3% CPU overhead
The gap collapsed to 8%. More critically, Go’s p99 latency stayed rock-solid while Rust’s ballooned under connection turbulence. The difference? Backpressure handling complexity. Where io_uring Shines io_uring eliminates syscall overhead by using ring buffers for asynchronous I/O submission and completion. Instead of calling into the kernel repeatedly, you batch operations and poll for results. For specific workloads, this design is brilliant: // Simplified io_uring submission let mut queue = IoUring::new(256)?; for _ in 0..batch_size {
queue.prep_read(fd, buf, offset)?;
} queue.submit_and_wait(batch_size)?; This code submits operations in batches, reducing context switches. When your workload matches this pattern — predictable, homogeneous operations with minimal error cases — io_uring dominates. Ideal io_uring Scenarios:
- Database connection pooling with stable traffic
- File serving with predictable read patterns
- High-frequency trading where microseconds matter
- Video streaming with consistent chunk sizes
Where epoll Dominates Go’s runtime integrates epoll seamlessly with its goroutine scheduler. The magic happens automatically: // Go's epoll is invisible conn, err := net.Listen("tcp", ":8080") go handleConnection(conn) // epoll magic happens here That simplicity masks sophisticated engineering. Go’s scheduler multiplexes thousands of goroutines across available threads, automatically parking blocked I/O operations. When a socket becomes readable, the scheduler wakes the corresponding goroutine — zero explicit management needed. Critical epoll Advantages:
- Automatic backpressure through channel semantics
- Memory-safe concurrency without lifetime annotations
- Built-in profiling and observability
- Millisecond-scale deployment from single binary
Go’s runtime abstracts epoll complexity entirely. The netpoller integrates directly with the scheduler, enabling millions of concurrent connections without explicit event loop management or callback hell. The Hidden Cost of Control io_uring gives you control — the kind that database engineers dream about. But control requires expertise. Our Rust migration revealed costs we hadn’t budgeted for: Development Velocity Impact:
- Feature development slowed 37% due to lifetime annotation complexity
- Debugging async issues took 2.8x longer without integrated profiler
- New engineer onboarding extended from 2 weeks to 6 weeks
- Error handling code grew from 12% to 29% of codebase
The issue wasn’t Rust itself — it’s magnificent for systems where correctness dominates. The issue was that network programming prioritizes operational simplicity over raw performance once you clear minimum throughput requirements. Consider error handling. In Go, network errors are values: n, err := conn.Read(buf) if err != nil {
log.Error("read failed", err)
return err
} In async Rust with io_uring, error handling threads through futures, requires explicit cancellation handling, and complicates shutdown logic: match ring.submit_and_wait(1) {
Ok(completed) => {
for cqe in completed {
if cqe.result() < 0 {
// Handle specific errno
// Coordinate with other operations
// Clean up ring resources
}
}
}
Err(e) => {
// Ring submission failed
// Need to clean up queued operations
}
} This complexity cascades. Timeout handling, connection lifecycle management, and graceful shutdown each add layers of intricate state machines. Memory: The Silent Killer Our most surprising finding wasn’t about throughput — it was about memory efficiency. Traditional wisdom says compiled languages use less memory than garbage-collected ones. Our production data disagreed: Memory Footprint (1M Concurrent Connections):
- Rust/io_uring: 4.2 GB (4.2 KB per connection)
- Go/epoll: 2.8 GB (2.8 KB per connection)
How? Go’s goroutine stacks start at 2 KB and grow dynamically. Rust’s async futures allocate heap memory for state machines. With deep async call chains, those allocations compound. For our 15-layer middleware stack, each connection allocated 4.1 KB just for future state. Add io_uring’s buffer requirements — you need pre-allocated buffers for submission queues — and memory consumption grows faster than connection count. The Observability Gap Production systems live or die by observability. When things break at 3 AM, you need answers fast. Go ships with pprof — a production-ready profiler that works out of the box: import _ "net/http/pprof" // That's it. Visit /debug/pprof for CPU, memory, goroutine traces Rust’s tooling requires explicit instrumentation. Tokio-console helps, but requires advance planning. You can’t just attach to a running process and diagnose issues. This difference cost us 14 hours of downtime during a critical incident — we simply couldn’t see what was happening inside the io_uring event loop. Debugging Time Comparison (P95):
- Network issues in Go: 23 minutes to root cause
- Network issues in Rust: 94 minutes to root cause
The 4x differential compounds across teams. For our infrastructure of 200 microservices, that’s the difference between rapid iteration and development paralysis. The Decision Framework After six months running both systems in production, here’s when each architecture wins: Choose io_uring + Rust When: Performance requirements are extreme:
- Sub-10-microsecond p99 latency is mandatory
- You’re processing 5M+ ops/sec per server
- Every 1% performance gain has measurable business value
- Infrastructure costs dominate engineering costs
Workload characteristics align:
- Homogeneous operation patterns (all reads, all writes)
- Minimal error handling complexity
- Predictable buffer sizes
- Limited middleware layers
Team capacity exists:
- Experienced systems programmers available
- Time to develop deep async Rust expertise
- Budget for custom observability tooling
- Long-term maintenance commitment
Choose epoll + Go When: Development velocity matters:
- Rapid feature iteration required
- Small to medium-sized team
- Complex middleware requirements
- Frequent protocol changes
Operational simplicity is critical:
- Limited DevOps resources
- Quick incident response required
- Standard observability tooling sufficient
- Fast engineer onboarding needed
Workload characteristics:
- Mixed request sizes and patterns
- Frequent connection churn
- Complex error handling requirements
- Backpressure management critical
Cost considerations:
- Engineering time more expensive than infrastructure
- Current performance adequate (sub-100ms p99)
- Horizontal scaling is acceptable
The Surprising Truth The most shocking revelation from our journey? We ended up running both. Not because we compromised, but because we optimized for different constraints. Our hot path — the 15% of traffic accounting for 82% of revenue — runs on Rust with io_uring. Those services process payment validation where single-digit millisecond latency directly impacts conversion rates. The 4x development slowdown is justified by measurable revenue gains. Everything else? Go with epoll. Our authentication service, API gateway, and internal service mesh all run Go. They’re fast enough (720K req/sec exceeds our needs), operationally simple, and let our team ship features weekly instead of monthly. The lesson: Architecture decisions are economic tradeoffs, not engineering purity contests. Raw performance matters when it directly impacts business outcomes. Otherwise, operational simplicity, development velocity, and team productivity should dominate. Your Next Steps Before choosing your I/O model, answer these questions:
- What’s your actual throughput requirement? If it’s under 500K req/sec per server, Go’s epoll probably suffices.
- What’s your latency budget? If you can tolerate 50+ microsecond p99, you’re trading complexity for marginal gains.
- What’s your team’s expertise? Async Rust has a steep learning curve. Factor that into your timeline.
- What’s your monitoring infrastructure? If you rely on standard profiling tools, Go’s ecosystem is years ahead.
- How often do you deploy? If it’s daily or more frequent, Go’s fast compile times and operational simplicity pay ongoing dividends.
The future of high-performance I/O isn’t a single winner — it’s knowing exactly when each tool fits your specific constraints. io_uring represents genuine advancement in kernel I/O, but only when your requirements justify its complexity.
Read the full article here: https://medium.com/@chopra.kanta.73/network-programming-battle-io-uring-in-rust-vs-epoll-in-go-ab6df1f34976