When the Optimizer Lies: The Subtle Art of Unsafe Performance in Rust
If you’ve ever gone down the “I can make this faster” rabbit hole in Rust, you know the feeling: you sprinkle in some unsafe, tighten your loops, and suddenly — boom — a benchmark shows a 2× improvement. You lean back, satisfied. Then you rerun it on another machine… and it’s slower. Or worse — it segfaults.
Welcome to the shadowy world of Rust’s optimizer — where your code and LLVM’s assumptions sometimes part ways. And sometimes, the optimizer lies. What the Optimizer Actually Does
At its core, Rust relies on LLVM (Low-Level Virtual Machine) to transform your high-level code into efficient machine instructions. The optimizer’s job is to:
- Inline small functions
- Remove unused branches and variables
- Vectorize loops
- Reorder memory accesses for efficiency
- Eliminate “redundant” computations
Sounds good, right? Except — those transformations are only guaranteed safe if your code is 100% well-defined under the rules of Rust’s semantics. Once you go unsafe, you step outside that safety contract. And LLVM no longer owes you honesty. When the Optimizer “Lies”
When we say the optimizer “lies,” we don’t mean LLVM is buggy — we mean it’s making perfectly valid assumptions based on what it thinks your code guarantees. Here’s a tiny, innocent example:
fn sum(slice: &[i32]) -> i32 {
let mut total = 0;
for i in 0..slice.len() {
unsafe {
total += *slice.get_unchecked(i);
}
}
total
You might think get_unchecked() just skips bounds checking — and it does. But now, you’ve told LLVM: “I guarantee all these accesses are in-bounds.” That’s a powerful promise. So powerful that LLVM might decide to reorder loop iterations, unroll the loop, or even combine memory loads. If there’s even one invalid pointer, LLVM can assume nothing past that point exists — and remove entire blocks of code as “unreachable.” That’s when the optimizer “lies”: it takes your promise too seriously.
The Real-World Case: The UB Chain Reaction Let’s look at a real case where performance tuning went nuclear.
fn fast_avg(data: &[f32]) -> f32 {
assert!(!data.is_empty());
let mut sum = 0.0;
unsafe {
for i in 0..data.len() {
sum += *data.get_unchecked(i);
}
}
sum / data.len() as f32
}
Intended: remove bounds checks for speed. Reality: if data is empty even once, UB (undefined behavior) enters. LLVM, now free from all guarantees, assumes “since UB can never happen,” it can skip divisions or merge float ops. In debug mode it’s fine — in release, you might get a random NaN, infinite loop, or even… optimized-out code. That’s the subtle art: you can’t reason about performance in unsafe code without understanding LLVM’s assumptions. Architecture Design: Safe Core + Unsafe Shell To survive this world, experienced Rust devs often adopt a layered architecture for unsafe performance work:
+----------------------------------+ | Safe Public API | +----------------------------------+ | Thin Unsafe Abstraction Layer | +----------------------------------+ | Verified Unsafe Core | +----------------------------------+
1. Safe Public API All user-facing functions are 100% safe and checked. They validate inputs, sizes, and invariants. 2. Thin Unsafe Abstraction Encapsulate the unsafe operations inside carefully reviewed functions. 3. Verified Unsafe Core This is where you use raw pointers, unchecked indexing, or intrinsics — but only after invariants are guaranteed. Let’s see an example:
pub fn sum_safe(slice: &[i32]) -> i32 {
if slice.is_empty() {
return 0;
}
unsafe { sum_unchecked(slice) }
}
unsafe fn sum_unchecked(slice: &[i32]) -> i32 {
let mut total = 0;
for i in 0..slice.len() {
total += *slice.get_unchecked(i);
}
total
}
This way, the unsafe block never receives invalid data, and the optimizer’s assumptions remain correct. Internal Working: LLVM’s View of Your Code
When you compile Rust with --emit=llvm-ir, you can peek under the hood:
rustc --emit=llvm-ir src/main.rs You’ll see IR like this: define i32 @sum(i32* %ptr, i64 %len) { entry:
%cmp = icmp eq i64 %len, 0 br i1 %cmp, label %exit, label %loop
loop:
%val = load i32, i32* %ptr ; LLVM might assume no aliasing here! ...
}
Notice how LLVM assumes:
- No aliasing between ptr and other memory
- All accesses valid if unsafe was used
- Loop bounds are consistent
These assumptions fuel aggressive optimizations — but break instantly if your “promises” don’t hold. Benchmarks: Safe vs Unsafe Let’s test the performance difference in practice.
Benchmark Code (using criterion):
use criterion::{criterion_group, criterion_main, Criterion};
fn safe_sum(slice: &[i32]) -> i32 {
slice.iter().sum()
} fn unsafe_sum(slice: &[i32]) -> i32 {
let mut total = 0;
unsafe {
for i in 0..slice.len() {
total += *slice.get_unchecked(i);
}
}
total
} fn bench_sum(c: &mut Criterion) {
let data: Vec<i32> = (0..1_000_000).collect();
c.bench_function("safe_sum", |b| b.iter(|| safe_sum(&data)));
c.bench_function("unsafe_sum", |b| b.iter(|| unsafe_sum(&data)));
} criterion_group!(benches, bench_sum); criterion_main!(benches); Results (on Rust 1.82, -O3):
| Function | Time (ns) | Speedup | | ------------ | --------- | ------- | | `safe_sum` | 2,150,000 | 1.0× | | `unsafe_sum` | 1,880,000 | 1.14× |
Only 14% faster — but with 100% more risk. Modern compilers already optimize bounds checks efficiently — unsafe wins are often marginal, unless you’re in tight SIMD-heavy loops.
Key Takeaways
- Unsafe doesn’t mean faster — it just removes compiler checks.
- LLVM assumes your unsafe code is perfect — or it’ll optimize you into UB.
- Design your code in layers — safe API, thin unsafe core.
- Inspect LLVM IR to understand how optimizations actually happen.
- Benchmark everything — don’t trust intuition.
Final Thoughts
The optimizer isn’t evil — it’s just brutally logical. It believes your promises more than you do. In safe Rust, that’s a gift. In unsafe Rust, it’s a curse. Writing high-performance unsafe code isn’t about trusting the compiler — it’s about earning its trust. The more you understand how LLVM thinks, the more control you have over your performance destiny.