Jump to content

The need to migrate from C to Rust

From JOHNWICK

I was looking today the latest articles from the Communications of the ACM publication, and an article got me thinking. Mainly, because I was not expecting to see it. Its title was “Automatically Translating C to Rust”. To be honest, it was an article, focusing on a very technical matter, translation of legacy systems written in C to Rust. The main idea of this article argues that while migrating legacy C code to Rust is crucial for improving software reliability and eliminating memory bugs, existing automatic translation tools like C2Rust are insufficient. These tools produce syntactically valid Rust code that, unfortunately, remains unsafe (by using Rust’s unsafe features like raw pointers) and unidiomatic (by retaining C-style patterns like output parameters). The authors propose that these shortcomings can be addressed by using static analysis to progressively refine the translated code. They also discuss the potential—and current limitations—of using Large Language Models (LLMs) for this task.

Why to migrate in the first place? C is widely used in systems programming but is “infamous” for its lack of language-level safety. This leads to memory bugs like buffer overflow and use-after-free, which are responsible for a large portion of security vulnerabilities (e.g., ~70% in Microsoft’s codebase). Rust on the other hand is the “most promising migration target” because its ownership type system provides strong memory safety guarantees at compile time, while still offering the low-level control and high performance of C. So there is a trend to port significant parts of open source projects, like the linux kernel and the GNU Coreutils in Rust.

Automatic Translation The most successful translator is C2Rust. While it handles the syntantic translation, it translates the code with the same functional properties of the original C program, thus ending up on unsafe and un-idiomatic Rust programs.

And now? The core challenge is that Rust features (like Box owning pointers or & references) require explicit information about program behavior (like ownership) that is only implicit in C code. The authors advocate for using static analysis to automatically discover this implicit information. This analysis would run in multiple “refinement passes” after the initial C2Rust translation, progressively replacing unsafe features and unidiomatic patterns with their safe, idiomatic Rust alternatives. They already exhibit progress on this front, by addressing several key areas, like pointers, locks etc with efficient abstractions. But the work is far from complete and the author consider the usage of LLMs to automatically perform semantic analysis of the programs in a more efficient way (trust me I know the hardships of static analysis and transpiling). One final remark from me? Why porting and not rewriting those utils? Most of them are not so complex. And one more thing … never forget the precious wisdom of Joel on Software.