7 Benchmarks That Finally Made Me Leave Python for Rust

Seven independent benchmarks proved Python was the bottleneck and convinced me to rewrite hot paths in Rust.

This is practical, measured work. Short reads and long reads both matter. If a function costs seconds for every request, that function matters. If you are running production code, these benchmarks will give you the data you need to decide.

Test, do not guess.
Replace only the true hot paths.
Rust gave consistent, large speedups while keeping Python for orchestration.
The seven benchmarks below show real code, the exact change, and the measured result.

Test environment and methodology

Test machine: 6-core laptop, 16 GB RAM, Linux.
Python: recent interpreter, tests run from the command line.
Rust: compiled in release mode with cargo build --release.
Each timing is the mean of several runs.
Focus: CPU-bound work and real parsing tasks where Python showed limits.

1 — Sqrt loop (tight numeric loop) Problem: A hot loop computes millions of square roots in pure Python. The loop is CPU-bound, and Python’s per-iteration overhead is expensive. Python import math, time

def compute(n):

   s = 0.0
   for i in range(n):
       s += math.sqrt(i + 1.0)
   return s

t = time.time() compute(10_000_000) print(time.time() - t) Rust fn compute(n: usize) -> f64 {

   let mut s = 0.0;
   for i in 0..n {
       s += ((i as f64) + 1.0).sqrt();
   }
   s

}

fn main() {

   let t = std::time::Instant::now();
   compute(10_000_000);
   println!("{:?}", t.elapsed());

} Change: Move the tight numeric loop to Rust compiled with optimizations. Result: Python 11.84 s → Rust 0.92 s = 12.87x faster. This is the classic case where per-iteration overhead dominates; Rust removes that overhead. 2 — Recursive Fibonacci (pure CPU recursion) Problem: A naive recursive implementation is used for a small benchmark. Python recursion and function-call overhead slow this down. Python import time

def fib(n):

   if n <= 1:
       return n
   return fib(n - 1) + fib(n - 2)

t = time.time() fib(35) print(time.time() - t) Rust fn fib(n: u32) -> u64 {

   match n {
       0 => 0,
       1 => 1,
       _ => fib(n - 1) + fib(n - 2),
   }

}

fn main() {

   let t = std::time::Instant::now();
   fib(35);
   println!("{:?}", t.elapsed());

} Change: Implement the same logic in Rust to avoid interpreter overhead. Result: Python 2.80 s → Rust 0.44 s = 6.36x faster. For heavy recursion and raw compute, Rust yields strong wins.

3 — JSON parsing and light aggregation (100 MB) Problem: Reading line-delimited JSON and doing small aggregations in Python is slower than using a native parser written in Rust. Python import json, time

t = time.time() with open("data.json") as f:

   lines = [json.loads(l) for l in f]

s = sum(len(l.get("text", "")) for l in lines) print(time.time() - t) Rust use std::fs::File; use serde_json::Deserializer; use std::io::BufReader;

fn main() {

   let f = File::open("data.json").unwrap();
   let rdr = BufReader::new(f);
   let stream = Deserializer::from_reader(rdr).into_iter::<serde_json::Value>();
   let mut s = 0usize;
   for v in stream {
       let v = v.unwrap();
       if let Some(t) = v.get("text") {
           s += t.as_str().map(|x| x.len()).unwrap_or(0);
       }
   }
   println!("{}", s);

} Change: Use Rust streaming parser (serde_json) and iterate without building large Python objects. Result: Python 8.30 s → Rust 1.20 s = 6.92x faster. Rust eliminates Python object allocation and GC pressure in this workload.

4 — Regex tokenization (50 MB) Problem: Tokenizing large text with Python re is slower than Rust regex, which is optimized and compiled to native code. Python import re, time

pat = re.compile(r"\w+") t = time.time() with open("big.txt") as f:

   text = f.read()

tokens = pat.findall(text) print(time.time() - t) Rust use std::fs; use regex::Regex;

fn main() {

   let text = fs::read_to_string("big.txt").unwrap();
   let re = Regex::new(r"\w+").unwrap();
   let t = std::time::Instant::now();
   let count = re.find_iter(&text).count();
   println!("{:?}", t.elapsed());

} Change: Move heavy tokenization to Rust using regex crate. Result: Python 4.60 s → Rust 0.65 s = 7.08x faster. Regex engines in Rust are highly optimized and avoid Python loop overhead.

5 — Threaded CPU map (8 workers) Problem: CPU-bound concurrency in Python with threads hits the GIL; parallelism is limited without processes. Rust threads run truly in parallel and scale with cores. Python (threading example) import threading, time

def work(n):

   s = 0
   for i in range(n):
       s += i * i
   return s

def worker():

   work(5_000_000)

t = time.time() threads = [threading.Thread(target=worker) for _ in range(8)] for th in threads:

   th.start()

for th in threads:

   th.join()

print(time.time() - t) Rust (threads) use std::thread;

fn work(n: usize) -> usize {

   let mut s = 0usize;
   for i in 0..n { s += i * i; }
   s

} fn main() {

   let t = std::time::Instant::now();
   let mut handles = vec![];
   for _ in 0..8 {
       handles.push(thread::spawn(|| work(5_000_000)));
   }
   for h in handles { let _ = h.join(); }
   println!("{:?}", t.elapsed());

} Change: Replace Python-threaded CPU work with Rust threads inside a compiled binary. Result: Python 16.80 s → Rust 1.90 s = 8.84x faster. For CPU-bound parallel work, Rust uses cores efficiently.

6 — CSV parsing and aggregation (5 million rows) Problem: Iterating CSV rows in Python and casting types per row is costly when millions of rows appear. Python import csv, time

t = time.time() s = 0 with open("data.csv") as f:

   r = csv.reader(f)
   for row in r:
       s += int(row[2])

print(time.time() - t) Rust use csv::Reader;

fn main() {

   let mut rdr = Reader::from_path("data.csv").unwrap();
   let mut s: i64 = 0;
   for result in rdr.records() {
       let record = result.unwrap();
       s += record[2].parse::<i64>().unwrap();
   }
   println!("{}", s);

} Change: Use Rust csv crate and parse without Python-level object creation. Result: Python 22.60 s → Rust 2.70 s = 8.37x faster. I/O plus per-row parsing benefits greatly from native parsing speed.

7 — Large string building and concatenation Problem: Repeated string concatenation in Python leads to repeated reallocations and heavy memory churn. Python import

t = time.time() s = "" for i in range(1_000_000):

   s += str(i) + ","

print(time.time() - t) Rust fn main() {

   let t = std::time::Instant::now();
   let mut s = String::new();
   s.reserve(10_000_000);
   for i in 0..1_000_000 {
       s.push_str(&i.to_string());
       s.push(',');
   }
   println!("{:?}", t.elapsed());

} Change: Use String::reserve in Rust and build in place. Result: Python 4.50 s → Rust 0.42 s = 10.71x faster. Memory allocation strategy matters. Rust gives precise control.

How to integrate Rust into your Python code Pick one hot function. Replace it with a Rust binary or a Python extension via PyO3. Rust (PyO3 minimal example) use pyo3::prelude::*;

[pyfunction]

fn compute_sqrt_py(data: Vec<f64>) -> Vec<f64> {

   data.into_iter().map(|x| x.sqrt()).collect()

}

[pymodule]

fn rustmod(_py: Python, m: &PyModule) -> PyResult<()> {

   m.add_function(wrap_pyfunction!(compute_sqrt_py, m)?)?;
   Ok(())

} Python usage import rustmod data = [i * 0.5 for i in range(1_000_000)] res = rustmod.compute_sqrt_py(data) If building a binary is simpler, call the Rust program as a subprocess, or expose a small HTTP/gRPC service for heavier workloads. The right choice depends on latency requirements and deployment constraints.

Architecture diagrams Simple in-process integration +-------------------+ call +------------------+ | Python App | -----------> | Rust Hot Module | | Orchestration | | (compute-heavy) | +-------------------+ +------------------+

       ^                                 |
       |                                 v
       |<----------- result -------------|

Microservice / autonomous service +---------------+ HTTP/gRPC +---------------+ Local I/O +-----------+ | Python Layer | ------------> | Rust Service | ------------> | Storage | | Orchestration | | Hot compute | | (SSD/DB) | +---------------+ +---------------+ +-----------+ These diagrams keep the system simple: Python remains the conductor; Rust handles the heavy hardware work.

When not to rewrite in Rust

Small functions that are not on the hot path.
Code that relies on Python-only ecosystems where migration cost is high.
When developer velocity and maintenance cost outweigh runtime gains.

If your highest latency component runs in milliseconds and you value fast iteration, remain in Python. If a function costs seconds per request or causes resource limits, evaluate migrating. Final takeaways

Profile first. Use a profiler and identify the real hot paths.
Measure baseline. Record runtimes, then migrate a single function.
Keep Python for orchestration. Use Rust where the CPU matters.
Automate tests. Add integration tests and performance checks.
Iterate. Migrate the next function only after the first shows real gains.

This work is not about swapping languages for prestige. This is about picking the right tool for real cost. If your users wait or your pipelines back up, you owe it to them to measure and act.

Read the full article here: https://medium.com/@diyasanjaysatpute147/7-benchmarks-that-finally-made-me-leave-python-for-rust-ff4d5bb2e57a