Rust in your disassembler
At r2con this year, I am going to present how to solve a CrackMe binary written in Rust. As the conference is online, I chose to record it mostly as a big demo with only very few slides. However, some of you might want to read a few details/theory. What the Rust compiler does is pretty smart and interesting. Strings are fat pointers It’s not like in C where your string is actually a simple pointer to the characters. In Rust, your inline strings will point to a structure with:
- A length. The length of your string. No need for your string to end with \0 like in C.
- A pointer to characters of your string.
Radare2 shows those strings as “reloc.fixup.xxxx”. For example below, the string is apparently at address 0x5e790. If we view the bytes at that address, we clearly see the pointer (6bc0 0400 -> 0x04c06b) and then the length (0x25). We confirm the characters are at 0x04c06b.
- 0x5e790 ; "k\xc0\x04"
0x00008b6a 488d051f5c.. lea rax, reloc.fixup.Space_Station_Airlock_Control_S
[0x00008b40]> px 10 @ 0x5e790 - offset - 9091 9293 9495 9697 9899 9A9B 9C9D 9E9F 0123456789ABCDEF 0x0005e790 6bc0 0400 0000 0000 2500 k.......%. [0x00008b40]> px 0x25 @ 0x04c06b - offset - 6B6C 6D6E 6F70 7172 7374 7576 7778 797A BCDEF0123456789A 0x0004c06b 5370 6163 6520 5374 6174 696f 6e20 4169 Space Station Ai 0x0004c07b 726c 6f63 6b20 436f 6e74 726f 6c20 5379 rlock Control Sy 0x0004c08b 7374 656d 0a stem. Monomorphism You have probably heard about poly-morphism. That’s when a given function is able to operate over different types. Monomorphism is “the opposite”: we ensure that there is strictly 1 function per type. The Rust compiler uses monomorphism automatically - for optimization reasons, like no need for a vtable. When it encounters a generic function, under the hood, it actually generates the assembly for one function per type. The developer does not see this, this is job of the compiler. // generic fn square<T: std::ops::Mul<Output = T> + Copy>(x: T) -> T {
x * x
} fn main() {
let a = square(3i32); // generates square::<i32> let b = square(2.5f64); //generates square::<f64>
} The compiler does this frequently for closures too. Closures A closure is a function that captures its environment. In the example below, add_x is a closure. It captures the environment where x = 5. Calling add_x(3) returns 8. fn main() {
let x = 5;
let add_x = |y| x + y;
println!("{}", add_x(3));
} What the Rust compiler does depends very much the optimization level. Let’s say you compile this with -C opt-level=0. [0x00007960]> pdi @ sym.main::main::h8f1c9e3495794b54 0x00007b90 sym.main::main::h8f1c9e3495794b54: 0x00007b90 4883ec68 sub rsp, 0x68 0x00007b94 c744240405000000 mov dword [rsp + 4], 5 0x00007b9c 488d442404 lea rax, [rsp + 4] 0x00007ba1 4889442408 mov qword [rsp + 8], rax 0x00007ba6 488d7c2408 lea rdi, [rsp + 8] 0x00007bab be03000000 mov esi, 3 0x00007bb0 e85b000000 call sym.main::main::__u7b__u7b_closure_u7d__u7d_::h0ea7a13e6c14ebb2 0x00007bb5 89442464 mov dword [rsp + 0x64], eax A specific closure for our main has been generated by the compiler. It is named: sym.main::main::__u7b__u7b_closure_u7d__u7d_::h0ea7a13e6c14ebb2. The arguments for the closure are:
- First argument (in rdi): rsp + 8. This is the closure environment. It contains a pointer to rsp + 4, which contains 5.
- Second argument (in esi): 3. This is the argument provided to the closure.
In the assembly of the closure, we have instructions that work for integers. 0x00007c11 488b07 mov rax, qword [rdi] 0x00007c14 0330 add esi, dword [rax] ... 0x00007c1f 8b442404 mov eax, dword [rsp + 4] If the closure had been used for a float, another closure would have been generated, with different instructions. That’s monomorphism. We can see that better by creating a closure that works both for floats and integers: use std::ops::Add;
fn get_adder<T>(x: T) -> impl Fn(T) -> T where
T: Add<Output = T> + Copy,
{
move |y| x + y
}
fn main() {
let add_int = get_adder(5); let add_float = get_adder(5.0);
println!("{}", add_int(3));
println!("{}", add_float(4.5));
} Now, if we inspect the assembly, notice the compiler has generated 2 get_adder functions. [0x00007960]> afl~main 0x00007c00 1 1 sym.main::get_adder::hba4b3b420a6e866b 0x00007c10 1 3 sym.main::get_adder::hbe8e2ff9c2a8357d 0x00007c20 1 22 sym.main::get_adder::__u7b__u7b_closure_u7d__u7d_::h6ff0b01bf90189e9 0x00007c40 1 17 sym.main::get_adder::__u7b__u7b_closure_u7d__u7d_::h72c4b1cd2fd53136 0x00007c60 1 251 sym.main::main::h8f1c9e3495794b54 And if we inspect the first get_adder assembly, we see it works for floats: 0x00007c20 sym.main::get_adder::__u7b__u7b_closure_u7d__u7d_::h6ff0b01bf90189e9: 0x00007c20 50 push rax 0x00007c21 0f28c8 movaps xmm1, xmm0 0x00007c24 f20f1007 movsd xmm0, qword [rdi] 0x00007c28 488d3d19ee0400 lea rdi, [rip + 0x4ee19] 0x00007c2f e87cfeffff call sym.__f64_as_core::ops::arith::Add_::add::hf4bd57df382c73d6 0x00007c34 58 pop rax 0x00007c35 c3 ret While the second get_adder works for integers: 0x00007c40 sym.main::get_adder::__u7b__u7b_closure_u7d__u7d_::h72c4b1cd2fd53136: 0x00007c40 50 push rax 0x00007c41 8b3f mov edi, dword [rdi] 0x00007c43 488d15feed0400 lea rdx, [rip + 0x4edfe] 0x00007c4a e871feffff call sym.__i32_as_core::ops::arith::Add_::add::h471fa892cd8f10a4 0x00007c4f 59 pop rcx 0x00007c50 c3 ret De-sugaring In Rust, developers typically call methods like obj.blah(). The Rust compiler transforms that internally to a more explicit (but less sweet) form: class::blah(&obj). let obj = MyObject { value: 10 }; obj.blah(); // sugared syntax See below the assembly that was generated:
- set value = 10
0x00007bc4 c744240c0a000000 mov dword [rsp + 0xc], 0xa
- put obj in RDI (1st argument of blah())
0x00007bcc 488d7c240c lea rdi, [rsp + 0xc]
- call blah() with address of obj as argument
0x00007bd1 e8baffffff call sym.sugar::MyObject::blah::h059bdb1e25fa70c4 Macros In Rust, the function to display things, println!, is a macro, not a “real” function. Therefore, in the assembly, you won’t see a call to println! (the “function” does not exist), but:
- Setup format strings and arguments.
- Build the Arguments structure.
- Call to dbg._print, which actually calls dbg.write_fmt, dbg.write…
I recommend you read this article for more details. The assembly below is generated from a Hello World program. 0x00007b20 sym.print::main::h13b5f0e40e4e7865: 0x00007b20 4883ec38 sub rsp, 0x38
- allocate place for Arguments object on the stack
0x00007b24 488d7c2408 lea rdi, [rsp + 8]
- fat pointer to the string Hello World
0x00007b29 488d35d0a30400 lea rsi, [rip + 0x4a3d0]
- instantiate the Arguments object
0x00007b30 e83bffffff call sym.core::fmt::rt::__impl_core::fmt::Arguments_::new_const::ha519b55ee7e59acf 0x00007b35 488d7c2408 lea rdi, [rsp + 8]
- call to the dbg._print routine via GOT
0x00007b3a ff1550cf0400 call qword [rip + 0x4cf50] 0x00007b40 4883c438 add rsp, 0x38 0x00007b44 c3 ret Notice that the call to dbg._print is indirect: the GOT (Global Offset Table) entry for dbg._print is located at rip + 0x4cf50. Radare2 shows that as a reloc.fixup: [0x00007b20]> px 10 @ reloc.fixup.UHSHxHH_ - offset - 9091 9293 9495 9697 9899 9A9B 9C9D 9E9F 0123456789ABCDEF 0x00054a90 c043 0200 0000 0000 0000 .C........ [0x00007b20]> pd 10 @ 0x0243c0
;-- std::io::stdio::_print::h915f3273edec6464:
; DATA XREF from reloc.fixup.UHSHxHH_ @
┌ 206: dbg._print (int64_t arg1); │ `- args(rdi) vars(13:sp[0x18..0x80]) │ 0x000243c0 55 push rbp ; sync.rs:0:13 ; void _print();
— Cryptax
Read the full article here>