Binary Diet: Shrinking Rust Releases Without Sorcery: Difference between revisions
Created page with "“It’s just printing text,” he said. Fair point. A C version would’ve been what, 20KB? I muttered something about static linking and moved on. But that number kept bugging me. Four megabytes for twelve lines of code felt wrong. The Thing Nobody Tells You Here’s what happened: I built with cargo build --release and assumed that meant "optimized." Turns out, Rust's idea of optimized means "fast to execute" not "small to ship." The default release profile optimiz..." |
(No difference)
|
Revision as of 07:52, 16 November 2025
“It’s just printing text,” he said. Fair point. A C version would’ve been what, 20KB? I muttered something about static linking and moved on. But that number kept bugging me. Four megabytes for twelve lines of code felt wrong. The Thing Nobody Tells You Here’s what happened: I built with cargo build --release and assumed that meant "optimized." Turns out, Rust's idea of optimized means "fast to execute" not "small to ship." The default release profile optimizes for execution speed rather than minimal binary size. Worse — the binary includes debug symbols from the standard library by default. Even in release mode, binaries contain debug symbols from libstd because it’s distributed pre-compiled with those symbols included. That’s where most of those 4MB were hiding. Not in my code. In metadata I didn’t ask for. I thought stripping was automatic in release builds. Nope. What Actually Works Skip the theory. Here’s the Cargo.toml I use now: [profile.release] strip = true # Remove symbols opt-level = "z" # Optimize for size lto = true # Link-time optimization codegen-units = 1 # Better optimization, slower compile panic = "abort" # Skip unwinding code That config alone cut my binary from 4.2MB to 310KB. Same functionality. No magic flags, no nightly compiler features. The opt-level = "z" setting specifically instructs the compiler to optimize for minimal binary size, though in some cases opt-level = "s" might produce smaller results. You have to test both. I've seen "s" win on networking tools, "z" win on CLI apps. It's unpredictable. The panic = "abort" line matters more than you'd think. By default, Rust unwinds the stack during panics and produces helpful backtraces, but this unwinding code requires extra binary space—instructing the compiler to abort immediately removes this overhead. If your program panics in production, you're probably logging it anyway. The unwinding code is dead weight. Where The Bloat Actually Lives I spent three hours optimizing the wrong thing. I was rewriting algorithms, thinking my code was bloated. Then I ran cargo install cargo-bloat and ran: cargo bloat --release --crates Output looked like this: File .text Size Crate 8.1% 61.2% 231.5KiB std 2.5% 19.2% 72.4KiB my_app 1.2% 9.4% 35.5KiB regex The tool attributes functions to their originating crates through heuristic analysis, though it’s not 100% accurate and generic functions can be tricky to attribute correctly. Still, seeing std take 60% of my binary was eye-opening. My clever code? 19%. The rest was dependencies. Which brings me to something I wish I’d known earlier: dependency features. Most crates ship with everything enabled by default. Regex includes Unicode tables. Serde includes every format. Clap includes suggestions and color formatting. You probably don’t need all of that. [dependencies] clap = { version = "4", features = ["derive"], default-features = false } reqwest = { version = "0.12", features = ["json", "rustls-tls"], default-features = false } Disabling default features and selectively enabling only what’s needed can provide dramatic size reductions — switching from native-tls to rustls-tls alone can be one of the biggest wins. For my HTTP client, swapping from native TLS to Rustls saved 1.8MB. I didn’t change a single line of application code. The Codegen Units Thing Here’s one I got wrong for months. Rust compiles your code in parallel by default — 16 units at once in release mode. This parallelization improves compile times but prevents some optimizations; setting codegen-units to 1 allows for maximum size reduction optimizations. The tradeoff: your builds take longer. For a CLI tool I ship to users, I’ll wait an extra 30 seconds. For a microservice I deploy internally, maybe not worth it. Test it. I’ve seen 15% size reductions with codegen-units = 1. I've also seen 3%. Depends entirely on your dependency graph. Link-Time Optimization Is Not Optional LTO is where the linker gets one final pass at your entire program and says “actually, you don’t need this function” or “these three calls can inline.” LTO instructs the linker to optimize at the link stage, which can remove dead code and often reduces binary size. The downside? Linking takes forever. My laptop fan spins up. I’ve waited five minutes for large projects. But for release builds, it’s basically free size reduction. Set lto = true and go make coffee. There’s also lto = "thin" which is faster but less aggressive. I use thin for local testing, full LTO for releases. When This Actually Matters You’re deploying to embedded systems. Every kilobyte counts. You’re shipping Docker images. Smaller images mean faster cold starts and cheaper registry storage. You’re building CLI tools people download directly. Nobody wants to wait for a 50MB binary when the alternative is 5MB. I don’t obsess over binary size for internal services. If it’s running on AWS with 8GB of RAM, who cares if the binary is 20MB versus 5MB? But the moment you’re bandwidth-constrained or deploying to resource-limited environments, these techniques stop being academic. The One Thing That Bit Me Somewhere around iteration twelve, I discovered my binaries had duplicate dependencies. Two versions of rand, three versions of syn. My direct dependencies were fine — it was transitive deps pulling in old versions. Package managers like Cargo use semantic versioning to select dependencies, and multiple versions can coexist in a build if different dependencies require them. cargo tree shows you the dependency graph. Pipe it through grep for a specific crate: cargo tree | grep regex If you see multiple versions, check if you can update dependencies to align on one version. Sometimes you can add explicit version constraints to unify them. Sometimes you’re stuck because two libraries refuse to upgrade. That’s the game. What To Actually Do Right Now Pick one binary. Run cargo bloat --release --crates. Look at the top five crates by size. Can you disable default features? Check their docs. Add the release profile config from earlier. Build. Measure the difference. That’s it. You don’t need nightly features or experimental allocators. The default toolchain with five lines in Cargo.toml will probably cut your binary in half.