Jump to content

Inside LTO and ThinLTO: How Rust Compiles Across Crates for Speed

From JOHNWICK

If you’ve ever waited on cargo build --release and wondered, “What’s it doing for so long?” — you’re not alone. The answer isn’t just “optimizing.” It’s link-time optimizing — and that’s where the real black magic happens. Let’s go behind the curtain of LTO and ThinLTO, Rust’s secret weapons for squeezing every ounce of speed out of your binaries.

First, What Even Is LTO?

When you compile a Rust project, every crate (your code + dependencies) gets compiled separately into object files (.o), which are later combined into a final binary.

But there’s a catch — the compiler can’t “see” across crate boundaries. It doesn’t know that a function in crate_a could be inlined into crate_b, or that two copies of the same generic monomorphization could be deduplicated. That’s where Link-Time Optimization (LTO) comes in.

Instead of just slapping object files together, LTO lets the compiler re-analyze and optimize all crates as if they were one program. Example: The Simple Case That Proves the Point

// crate-a/src/lib.rs

  1. [inline(always)]

pub fn add(a: u32, b: u32) -> u32 {

   a + b

} // crate-b/src/main.rs use crate_a::add;

fn main() {

   let x = add(40, 2);
   println!("{}", x);

}

Without LTO, Rust treats crate_a::add as an external function call — it’s a boundary.
With LTO, that boundary disappears. The compiler inlines add directly into main.
Zero overhead. Zero function call. Same speed as if you wrote let x = 40 + 2;.

Under the Hood: The Real Architecture

Let’s look at how this actually works inside Rust’s compiler pipeline:

+----------------+
|   Rust Crate   |
+--------+-------+
         |
         |  rustc → LLVM IR (per crate)
         v
+----------------+
| Object (.o)    |
| LLVM bitcode   |
+----------------+
         |
         |  LTO enabled → link-time optimizer merges IRs
         v
+----------------+
| Final Binary   |
+----------------+

During normal builds, rustc compiles each crate independently, handing the linker raw machine code.
During LTO builds, instead of code, it passes LLVM bitcode — an intermediate representation. 
Then, at the very end, LLVM re-runs optimizations across all bitcode modules together before emitting the final executable.

But ThinLTO Changes Everything Full LTO is powerful… but painfully slow. 
Imagine hundreds of crates, each packed with inlined generics and traits — and the linker re-optimizing them together. Ouch. That’s where ThinLTO steps in.

ThinLTO is like LTO, but smart and distributed:

  • Each crate is optimized individually, producing a “summary” file (a map of functions, symbols, dependencies).
  • The linker uses those summaries to make targeted cross-module optimizations, only merging the parts that matter.

Think of it like a diff: instead of merging everything, it merges just the hot paths. Architecture Diagram

┌────────────────────┐
│   Crate 1          │  ┐
│   (bitcode + summary)│
└────────────────────┘ │
┌────────────────────┐ │
│   Crate 2          │ │
│   (bitcode + summary)│ → ThinLTO Summaries Merge
└────────────────────┘ │
┌────────────────────┐ │
│   Crate 3          │  ┘
│   (bitcode + summary)│
└────────────────────┘
         │
         v
   Cross-crate inline decisions
         │
         v
  Final optimized binary

In practice, ThinLTO gives you most of LTO’s performance — with much less compile-time pain.

Real Benchmark: Full LTO vs ThinLTO Let’s take a realistic microservice built in Rust:

| Mode     | Build Time | Binary Size | Runtime Perf |
| -------- | ---------- | ----------- | ------------ |
| No LTO   | 35s        | 6.8 MB      | baseline     |
| ThinLTO  | 52s        | 6.4 MB      | +8% faster   |
| Full LTO | 98s        | 6.2 MB      | +9% faster   |

That’s the trade-off:
ThinLTO gives you 80–90% of the gain for half the pain.
Full LTO still reigns supreme — if you can afford to wait.

A Real Example: Optimizing Across Crates

// math_utils/src/lib.rs pub fn square(x: i32) -> i32 {

   x * x

}


// main.rs use math_utils::square;

fn heavy_compute(nums: &[i32]) -> i32 {

   nums.iter().map(|&n| square(n)).sum()

} fn main() {

   let result = heavy_compute(&[1, 2, 3, 4, 5]);
   println!("{}", result);

}

Without LTO, every square() call is a real function call.
With LTO, Rust inlines square() into heavy_compute(), allowing LLVM to vectorize the entire loop.

The result?
What used to be 20+ instructions now compiles into a single SIMD loop. Enabling LTO and ThinLTO in Rust

  1. Cargo.toml

[profile.release] lto = true # enables full LTO

  1. or

lto = "thin" # enables ThinLTO codegen-units = 1 # helps with full optimization

Then build:

cargo build --release

For projects with many crates, you can even mix and match:

RUSTFLAGS="-Clto=thin" cargo build --release


The Real Reason Rust Uses LLVM’s LTO Here’s the truth: 
Rust doesn’t have its own optimizer for cross-crate linking. It relies fully on LLVM’s battle-tested machinery.

That’s why it works so seamlessly — it’s piggybacking on decades of compiler engineering from Clang and LLVM. 
And that’s also why LTO is one of Rust’s most powerful hidden levers. Why It Matters in the Real World

In production systems:

  • Binary size drops dramatically (less I/O, faster cold starts)
  • Inlining across crates boosts hot path performance
  • Deduplication trims away redundant code from generics
  • Symbol visibility is cleaner — fewer exported internals

That’s why companies building embedded Rust, WebAssembly runtimes, and high-frequency trading systems swear by LTO builds.

The Future: LTO + MIR Inlining?

Here’s where it gets exciting — the Rust compiler team has been discussing MIR-level optimizations across crates. If that happens, LTO could move upstream, allowing optimization even before LLVM IR — meaning:

  • Better control-flow decisions
  • Smarter lifetime elimination
  • Earlier constant propagation

Basically, LTO on steroids.

Final Thoughts

LTO and ThinLTO aren’t compiler “flags.”
They’re Rust’s quiet rebellion against the limits of modular compilation. When you turn them on, you’re telling the compiler: “Stop thinking in crates. Think in programs.” And that’s when Rust really starts to shine — not just as a safe language, but as a systems engineering weapon.

Read the full article here: https://medium.com/@theopinionatedev/inside-lto-and-thinlto-how-rust-compiles-across-crates-for-speed-066bdfbdada9