Editing Rust in your disassembler

At r2con this year, I am going to present how to solve a CrackMe binary written in Rust. As the conference is online, I chose to record it mostly as a big demo with only very few slides. However, some of you might want to read a few details/theory. What the Rust compiler does is pretty smart and interesting.

Strings are fat pointers
It’s not like in C where your string is actually a simple pointer to the characters. In Rust, your inline strings will point to a structure with:
* 		A length. The length of your string. No need for your string to end with \0 like in C.
* 		A pointer to characters of your string.

Radare2 shows those strings as “reloc.fixup.xxxx”. For example below, the string is apparently at address 0x5e790. If we view the bytes at that address, we clearly see the pointer (6bc0 0400 -> 0x04c06b) and then the length (0x25). We confirm the characters are at 0x04c06b.

<pre>
; 0x5e790 ; "k\xc0\x04"
0x00008b6a      488d051f5c..   lea rax, reloc.fixup.Space_Station_Airlock_Control_S 

[0x00008b40]> px 10 @ 0x5e790
- offset -  9091 9293 9495 9697 9899 9A9B 9C9D 9E9F  0123456789ABCDEF
0x0005e790  6bc0 0400 0000 0000 2500                 k.......%.
[0x00008b40]> px 0x25 @ 0x04c06b
- offset -  6B6C 6D6E 6F70 7172 7374 7576 7778 797A  BCDEF0123456789A
0x0004c06b  5370 6163 6520 5374 6174 696f 6e20 4169  Space Station Ai
0x0004c07b  726c 6f63 6b20 436f 6e74 726f 6c20 5379  rlock Control Sy
0x0004c08b  7374 656d 0a                             stem.

</pre>

Monomorphism
You have probably heard about poly-morphism. That’s when a given function is able to operate over different types. Monomorphism is “the opposite”: we ensure that there is strictly 1 function per type.
The Rust compiler uses monomorphism automatically - for optimization reasons, like no need for a vtable. When it encounters a generic function, under the hood, it actually generates the assembly for one function per type. The developer does not see this, this is job of the compiler.

<pre>
// generic
fn square<T: std::ops::Mul<Output = T> + Copy>(x: T) -> T {
  x * x
}
fn main() {
  let a = square(3i32); // generates square::<i32>
  let b = square(2.5f64); //generates square::<f64>
}
The compiler does this frequently for closures too.
Closures
A closure is a function that captures its environment. In the example below, add_x is a closure. It captures the environment where x = 5. Calling add_x(3) returns 8.
fn main() {
  let x = 5;
  let add_x = |y| x + y;
  println!("{}", add_x(3));
}
</pre>

What the Rust compiler does depends very much the optimization level. Let’s say you compile this with -C opt-level=0.
[0x00007960]> pdi @ sym.main::main::h8f1c9e3495794b54

<pre>
0x00007b90   sym.main::main::h8f1c9e3495794b54:
0x00007b90             4883ec68  sub rsp, 0x68
0x00007b94     c744240405000000  mov dword [rsp + 4], 5
0x00007b9c           488d442404  lea rax, [rsp + 4]
0x00007ba1           4889442408  mov qword [rsp + 8], rax
0x00007ba6           488d7c2408  lea rdi, [rsp + 8]
0x00007bab           be03000000  mov esi, 3
0x00007bb0           e85b000000  call sym.main::main::__u7b__u7b_closure_u7d__u7d_::h0ea7a13e6c14ebb2
0x00007bb5             89442464  mov dword [rsp + 0x64], eax
</pre>

A specific closure for our main has been generated by the compiler. It is named: sym.main::main::__u7b__u7b_closure_u7d__u7d_::h0ea7a13e6c14ebb2. The arguments for the closure are:

* 		First argument (in rdi): rsp + 8. This is the closure environment. It contains a pointer to rsp + 4, which contains 
5.

* 		Second argument (in esi): 3. This is the argument provided to the closure.
In the assembly of the closure, we have instructions that work for integers.

<pre>
0x00007c11               488b07  mov rax, qword [rdi]
0x00007c14                 0330  add esi, dword [rax]
...
0x00007c1f             8b442404  mov eax, dword [rsp + 4]

</pre>

If the closure had been used for a float, another closure would have been generated, with different instructions. That’s monomorphism. We can see that better by creating a closure that works both for floats and integers:
use std::ops::Add;

fn get_adder<T>(x: T) -> impl Fn(T) -> T
where
    T: Add<Output = T> + Copy,
{
    move |y| x + y
}

fn main() {
    let add_int = get_adder(5);
    let add_float = get_adder(5.0);

    println!("{}", add_int(3));
    println!("{}", add_float(4.5));
}

Now, if we inspect the assembly, notice the compiler has generated 2 get_adder functions.
[0x00007960]> afl~main
0x00007c00    1      1 sym.main::get_adder::hba4b3b420a6e866b
0x00007c10    1      3 sym.main::get_adder::hbe8e2ff9c2a8357d
0x00007c20    1     22 sym.main::get_adder::__u7b__u7b_closure_u7d__u7d_::h6ff0b01bf90189e9
0x00007c40    1     17 sym.main::get_adder::__u7b__u7b_closure_u7d__u7d_::h72c4b1cd2fd53136
0x00007c60    1    251 sym.main::main::h8f1c9e3495794b54

And if we inspect the first get_adder assembly, we see it works for floats:

0x00007c20   sym.main::get_adder::__u7b__u7b_closure_u7d__u7d_::h6ff0b01bf90189e9:
0x00007c20                   50  push rax
0x00007c21               0f28c8  movaps xmm1, xmm0
0x00007c24             f20f1007  movsd xmm0, qword [rdi]
0x00007c28       488d3d19ee0400  lea rdi, [rip + 0x4ee19]
0x00007c2f           e87cfeffff  call sym.__f64_as_core::ops::arith::Add_::add::hf4bd57df382c73d6
0x00007c34                   58  pop rax
0x00007c35                   c3  ret

While the second get_adder works for integers:

0x00007c40   sym.main::get_adder::__u7b__u7b_closure_u7d__u7d_::h72c4b1cd2fd53136:
0x00007c40                   50  push rax
0x00007c41                 8b3f  mov edi, dword [rdi]
0x00007c43       488d15feed0400  lea rdx, [rip + 0x4edfe]
0x00007c4a           e871feffff  call sym.__i32_as_core::ops::arith::Add_::add::h471fa892cd8f10a4
0x00007c4f                   59  pop rcx
0x00007c50                   c3  ret
De-sugaring

In Rust, developers typically call methods like obj.blah(). The Rust compiler transforms that internally to a more explicit (but less sweet) form: class::blah(&obj).
let obj = MyObject { value: 10 };
obj.blah(); // sugared syntax
See below the assembly that was generated:
<pre>
; set value = 10
0x00007bc4     c744240c0a000000  mov dword [rsp + 0xc], 0xa
; put obj in RDI (1st argument of blah())
0x00007bcc           488d7c240c  lea rdi, [rsp + 0xc]
; call blah() with address of obj as argument
0x00007bd1           e8baffffff  call sym.sugar::MyObject::blah::h059bdb1e25fa70c4
</pre>

Macros
In Rust, the function to display things, println!, is a macro, not a “real” function. Therefore, in the assembly, you won’t see a call to println! (the “function” does not exist), but:
* 		Setup format strings and arguments.
* 		Build the Arguments structure.
* 		Call to dbg._print, which actually calls dbg.write_fmt, dbg.write…
I recommend you read this article for more details.
The assembly below is generated from a Hello World program.

<pre>
0x00007b20   sym.print::main::h13b5f0e40e4e7865:
0x00007b20             4883ec38  sub rsp, 0x38
; allocate place for Arguments object on the stack
0x00007b24           488d7c2408  lea rdi, [rsp + 8]
; fat pointer to the string Hello World
0x00007b29       488d35d0a30400  lea rsi, [rip + 0x4a3d0]
; instantiate the Arguments object
0x00007b30           e83bffffff  call sym.core::fmt::rt::__impl_core::fmt::Arguments_::new_const::ha519b55ee7e59acf
0x00007b35           488d7c2408  lea rdi, [rsp + 8]
; call to the dbg._print routine via GOT
0x00007b3a         ff1550cf0400  call qword [rip + 0x4cf50]
0x00007b40             4883c438  add rsp, 0x38
0x00007b44                   c3  ret
</pre>

Notice that the call to dbg._print is indirect: the GOT (Global Offset Table) entry for dbg._print is located at rip + 0x4cf50. Radare2 shows that as a reloc.fixup:
[0x00007b20]> px 10 @ reloc.fixup.UHSHxHH_
- offset -  9091 9293 9495 9697 9899 9A9B 9C9D 9E9F  0123456789ABCDEF
0x00054a90  c043 0200 0000 0000 0000                 .C........
[0x00007b20]> pd 10 @ 0x0243c0
            ;-- std::io::stdio::_print::h915f3273edec6464:
            ; DATA XREF from reloc.fixup.UHSHxHH_ @ 
┌ 206: dbg._print (int64_t arg1);
│ `- args(rdi) vars(13:sp[0x18..0x80])
│           0x000243c0      55             push rbp                    ; sync.rs:0:13 ; void _print();


— Cryptax

Read the full article here: https://cryptax.medium.com/rust-in-your-disassembler-1aa700c3b041