Jump to content

The Hidden Cost of Monomorphization: Why Generics Make Rust Binaries Huge

From JOHNWICK

When I first started using Rust, I was told the same thing every Rustacean hears early on: “Generics are zero-cost abstractions.” And I believed it.
Until I built a CLI tool with a few generic data structures and the binary ballooned from 2 MB to 37 MB. I thought I had accidentally compiled in debug mode.
Nope — it was Release.
Welcome to Rust’s secret heavyweight: monomorphization. Let’s unpack what’s actually going on — and why “zero-cost” sometimes comes with a hidden price tag. The Promise of Zero-Cost Abstractions In C++ and Rust, generics (or templates) are resolved at compile time. That means the compiler doesn’t generate “generic” bytecode — it generates specialized versions of every function or struct for every type you use. That’s the essence of “zero-cost”:
no runtime dispatch, no virtual tables, no overhead. Let’s look at a simple example. fn add<T: std::ops::Add<Output = T>>(a: T, b: T) -> T {

   a + b

}

fn main() {

   let int_sum = add(2, 3);         // i32 version
   let float_sum = add(1.2, 3.4);   // f64 version

} What the compiler actually does: // Compiler-generated versions fn add_i32(a: i32, b: i32) -> i32 { a + b } fn add_f64(a: f64, b: f64) -> f64 { a + b } Each one is optimized for its type — that’s the magic.
But here’s the problem: if you call add() for 10 different types, the compiler literally copies and pastes your function 10 times (in machine code form). That’s monomorphization — and it’s one of the most powerful, misunderstood parts of Rust’s compilation model. What Monomorphization Really Does At a high level, Rust’s compilation process looks like this: Source Code (.rs)

HIR (High-level IR)

MIR (Mid-level IR)

Monomorphization

LLVM IR

Machine Code In the monomorphization phase, the compiler takes every generic function and replaces type parameters (T) with concrete types used in your codebase. It’s like cloning the function for every type combination it sees. This happens recursively.
If you have generic structs that hold other generics that call generic methods — buckle up. Example: The Binary Explosion Let’s simulate this in real Rust. fn print_vec<T: std::fmt::Debug>(v: Vec<T>) {

   println!("{:?}", v);

}


fn main() {

   print_vec(vec![1, 2, 3]);              // i32
   print_vec(vec!["a", "b", "c"]);        // &str
   print_vec(vec![1.0, 2.0, 3.0]);        // f64

} Here’s what happens behind the curtain:

  • print_vec::<i32>()
  • print_vec::<&str>()
  • print_vec::<f64>()

That’s three different compiled versions of the same function. Now imagine a generic struct used across dozens of modules, with nested generics inside async code — suddenly, your binary carries hundreds of megabytes of duplicated instructions. You don’t notice this in small projects. But once you’re writing frameworks, compilers, or embedded code, it’s a big deal. The Architecture of Monomorphization in Rustc Monomorphization happens inside the rustc MIR phase (Mid-level Intermediate Representation). Here’s a simplified view of the architecture: ┌──────────────────────┐ │ HIR (Syntax Tree) │ │ fn add<T>(a, b) {} │ └────────┬─────────────┘

        │
        ▼

┌──────────────────────┐ │ MIR (Generic Form) │ │ add<T> MIR Body │ └────────┬─────────────┘

        │ Monomorphization
        ▼

┌──────────────────────┐ │ MIR (Concrete) │ │ add<i32>, add<f64> │ └────────┬─────────────┘

        │
        ▼

┌──────────────────────┐ │ LLVM IR Generation │ └──────────────────────┘ Each time the compiler sees a monomorphized instance, it registers it in the CodegenUnit, which LLVM later optimizes and emits into the final binary. It’s elegant, but it scales linearly — more generic instantiations, more codegen units, bigger binaries. Why Rust Does It This Way Here’s the real reason Rust chose monomorphization: performance at runtime. Rust prefers to pay the cost at compile time (and disk space) rather than runtime.
That means:

  • No dynamic dispatch overhead.
  • Inline opportunities for LLVM.
  • Better CPU-level optimization (branch prediction, cache alignment).

It’s a philosophical choice — Rust trades compile-time bloat for runtime speed. But it’s not free. When It Backfires: The Fat Binary Problem Here’s a real-world story:
I once worked on a data-processing CLI written in Rust. After a few weeks of feature additions, the release binary was 153 MB. After profiling the symbol table using cargo bloat, I found something wild: 82% of the binary was duplicated generic code. Every Iterator, every .map(), every .filter() across different type chains had become its own chunk of compiled machine code. cargo bloat Example $ cargo bloat --release File size: 153.4 MB

 41.2%  my_project::data::pipeline::<impl Iterator>::map
 22.8%  my_project::analyze::process::<impl Iterator>::filter
 ...

Turns out, functional chains using generics multiply monomorphization cost — each variant of iterator chain builds its own unique function chain. Code Flow Example: Dynamic Dispatch vs Monomorphization Here’s a small but powerful demo. ✅ Generic (Monomorphized) fn do_work<T: Task>(task: T) {

   task.run();

} Trait Object (Dynamic Dispatch) fn do_work(task: &dyn Task) {

   task.run();

} Now compare: | Version | Compile Time | Binary Size | Runtime Speed | | ----------------------- | ------------ | ----------- | --------------- | | Generic (Monomorphized) | High | Large | Fast | | Trait Object | Low | Small | Slightly Slower | Generics: Compile-time duplication for speed.
Trait objects: Single code path for flexibility. When building plugins, scripting layers, or dynamic systems — trait objects win.
When you need raw throughput — monomorphization wins. How to Fight Binary Bloat There are a few real strategies to tame this beast:

  • Use trait objects for runtime polymorphism when types vary a lot.

let workers: Vec<Box<dyn Task>> = vec![

   Box::new(FileTask {}),
   Box::new(NetworkTask {}),

]; 2. Enable LTO (Link-Time Optimization) [profile.release] lto = "fat" codegen-units = 1 LTO merges identical functions across crates, deduplicating monomorphized code.

  • Use cargo bloat or cargo tree -d
Identify generic-heavy crates (itertools, serde_json, etc.) that multiply instantiations.
  • Prefer generic bounds only where needed
Over-generic code increases type explosion.
  • Consider specialization (unstable, but promising)
It can reduce duplicated code when types share common behavior.

The Truth About “Zero-Cost” Here’s the paradox that every seasoned Rust dev eventually realizes: “Zero-cost abstractions” doesn’t mean “free.”
It means “you pay once, not every time.” Rust’s generics give you runtime perfection — but compile-time punishment.
And honestly, that’s fair. It’s part of Rust’s deal with you:
no garbage collector, no virtual table overhead, no runtime penalty.
But in return — you wait longer for builds and your binary grows heavier. Final Thoughts If you’ve ever stared at a target/release folder wondering why your tiny CLI tool is the size of a Linux distro — now you know.
It’s not your fault.
It’s Rust doing its job too well. Monomorphization is both the heart and curse of Rust’s performance model — a tradeoff between compile-time flexibility and runtime perfection. The beauty is: once you understand it, you can actually control it.
And that’s when Rust really starts feeling like magic.

  • Monomorphization = compiler cloning generic functions for each type.
  • Increases binary size and compile time.
  • Gives zero runtime overhead and better optimization.
  • Mitigate with trait objects, LTO, and code deduplication tools.
  • It’s not a bug — it’s the price of control.

Read the full article here: https://medium.com/@theopinionatedev/the-hidden-cost-of-monomorphization-why-generics-make-rust-binaries-huge-ff04edffb5e9