Jump to content

Learning Rust Part 2 — Data Layout and Enums in Practice

From JOHNWICK

The Learning Journey Continues! Let’s keep building our intuition for Rust’s memory model — and level up our tiny CLI app while we’re at it. Recap Last time, we learned ownership: every value in Rust has exactly one owner, and when the owner goes out of scope, the value is dropped. We saw moves, copies, borrows, and how those ideas make memory safety the language’s responsibility rather than yours. Today we’ll go one level deeper — into how data lives in memory, and how Rust’s enums and structs let you represent real-world domains efficiently and safely. We’ll also refactor our task-echo CLI into a reusable task-core module that models tasks more richly.


Stack vs Heap: Deeper Intuition Rust doesn’t force you to know stack vs. heap, but understanding it helps explain why ownership and lifetimes behave as they do.

  • Stack: Fast, per-thread memory for fixed-size values known at compile time.
Think: integers, small structs, references (&T), and stack frames for function calls.
  • Heap: Dynamically allocated memory that can grow or shrink at runtime.
Used by String, Vec<T>, Box<T>, and other growable containers.

The ownership system decides who frees the heap memory when done. Visualization can help us understand a simple case of stack v.s. heap let s = String::from("rust"); This is a simple string type variable. Strings are an immutable type in Rust and the stack heap situation looks something like this:


A small string header is created on the stack. It holds a pointer to heap memory, plus length, and capacity fields. The stack holds fixed-size values (like the 24 byte header). 8 bytes for pattern, 8 bytes for length, and 8 bytes for capacity. So why do we see just 3 blocks within the Stack instead of 24 individual bytes, why are these bytes grouped — and what do they mean? Well, ptr, len, and cap are all considered machine words, and are not individual bytes at all — a ptr is an address in memory on the heap, len is a usize integer indicating the number of bytes in use, and cap is a usize integer that defines the allocated buffer size. In memory they are indeed 24 contiguous bytes — but we group conceptually by three fields for the sake of:

  • Each field has a type and meaning
  • The compiler lays them out contiguously with word alignment.
  • We normally reason in words or fields, not bytes, because the CPU and compiler both work on word-aligned chunks for efficiency. This is kind of the point of high-level programming in general.

One machine word can be understood conceptually as eight bytes — hence leading to having 24 bytes in this 3 category header. When Rust compiles the string, it emits instructions that load or store these fields as full 64-bit values on modern machines. 24 bytes…why? This is because every type in Rust has an alignment requirement. For a 64-bit pointer or usize, the alignment is 8 bytes.
That means each field must begin at an address divisible by 8. If we used smaller units, the compiler might need to insert padding to restore alignment, which could increase the total size — not decrease it. As for the heap, this is where the actual string data — the sequence of characters as seen above — is stored. When you call String::from("rust")like we did above, Rust requests a block of memory from the system allocator large enough to hold four UTF-8 bytes. The allocator returns the starting address of that region, and that address becomes the value of the ptr field in the 24-byte stack header. Those bytes on the heap are laid out contiguously just like a C array, beginning at ptr and extending for cap bytes, of which len are currently filled with valid data. In this case, the heap block contains the bytes [114, 117, 115, 116]—the ASCII codes for 'r', 'u', 's', and 't'. The heap region itself is part of your process’s dynamic memory area, managed by Rust’s allocator and organized in aligned chunks so that any type can be safely stored. When you append to the string and exceed its current capacity, Rust allocates a larger heap block, copies the old bytes into it, updates ptr and cap, and frees the old one. Finally, when the string goes out of scope, the destructor drop runs: it tells the allocator to free the heap block pointed to by ptr, reclaiming that memory deterministically. In short, the stack header is the control structure, and the heap is the storage area it governs—together they form one logical value that Rust manages safely and efficiently without a garbage collector.


Structs & Tuples — Named VS Positional Data A struct gives semantic meaning to memory layout by naming fields: struct Task {

   id: u64,
   title: String,
   status: Status,

} A tuple, meanwhile groups fields by position. let t: (u64, String, Status) = (1, "Write article".into(), Status::Todo); Both are stored similarly, but structs are generally preferred for less dynamic types. Both are stored similarly in memory, but have different use cases, structs especially improve clarity. Named fields make refactors and pattern matching safer. Nonetheless, tuples work well for fringe use cases where we have ad-hoc returns or temporary (rather than permanent) grouping. In short, use structs for domain models; tuples for local, short-lived grouping.


4. Enums — Tagged Unions with Superpowers C-style enums represent integer tags. Rust’s enums can hold data, making them algebraic data types (ADTs). They encode both state and data shape safely and compactly. enum Status {

   Todo,
   InProgress { assignee: String },
   Done { completed_at: std::time::SystemTime },

} Each variant can hold different kinds of data.

  • No Data (Todo)
  • Named Fields (InProgress…) — this is a struct
  • Tuple-style Fields (Done(SystemTime))

Rust stores this as a tagged union: one discriminant (the tag) + the data of the active variant. So where C-style enums require you to actually set the enums state and remember which payload goes with which tag (often via a union or void*), Rust bakes that invariant into the type itself. The discriminant and payload move together as a single value, and the compiler forces you to exhaustively pattern-match every variant before you can read the data, eliminating mismatches and “invalid states.” There’s no manual tag bookkeeping, no unchecked casts, and no UB from reading the wrong field. You still get tight, predictable layouts (often zero-cost compared to hand-rolled C), plus layout optimizations like niche/null-pointer optimization for Option<&T> and Option<Box<T>>. In short: C makes correctness a convention; Rust makes it a compile-time contract.


Why Rust Enums are “Safe” In many systems languages (like C or C++), you might combine a manually maintained enum with a union or a void* to represent "one of several possible states." For instance: enum StatusTag {

   TODO,
   IN_PROGRESS,
   DONE

};

union StatusPayload {

   char* assignee;
   time_t completed_at;

};

struct Status {

   enum StatusTag tag;
   union StatusPayload data;

}; This works — but it’s entirely your responsibility to ensure you only read data.assignee when the tag is IN_PROGRESS, and only read data.completed_at when the tag is DONE.
If those ever drift out of sync, you get undefined behavior. Rust’s enum fuses these into a single type that guarantees at compile time that you only ever access the active variant’s data. The compiler enforces exhaustiveness in pattern matching and moves or drops the correct payload automatically. Let’s restate our Rust version: enum Status {

   Todo,
   InProgress { assignee: String },
   Done { completed_at: std::time::SystemTime },

} A Safer Model: Compile-Time Tag Integrity The key idea: the compiler itself carries the tag information along with the payload. You can’t construct an invalid Status, nor can you access the wrong field for a given variant. For example, this won’t compile: let s = Status::Todo; println!("Completed at: {:?}", s.completed_at); The compiler immediately errors: no field completed_at on enum Status You must pattern-match first, proving to the compiler that you’ve considered all variants. match s {

   Status::Todo => println!("Still pending"),
   Status::InProgress { assignee } => println!("Working by {assignee}"),
   Status::Done { completed_at } => println!("Done at {:?}", completed_at),

} That’s exhaustive, type-safe, and checked at compile time. This is what makes Rust’s enums “safe”: invalid states are unrepresentable.


Niche Optimization Rust also performs clever tricks called niche optimizations, especially for Option<T> types. For example, an Option<&T> is exactly the same size as a bare &T — because Rust uses the invalid null pointer value as the discriminant. Similarly, Option<Box<T>> and Option<NonZeroUsize> are the same size as their inner type. This is one of the key space optimizations that makes Rust enums so compact: instead of wasting a separate byte for the discriminant, Rust uses “unused” bit patterns when it can.


Modeling Tasks with Enums and Structs Now that we understand how Rust lays out data, how enums guarantee state correctness, and why structs give shape to that data, we can combine these tools into a clean, expressive domain model for our tiny CLI app. Let’s start with a simple Task struct and a Status enum: use std::time::SystemTime;

  1. [derive(Debug, Clone)]

pub enum Status {

   Todo,
   InProgress { assignee: Option<String> },
   Done { completed_at: SystemTime },
   Canceled { reason: String },

}

  1. [derive(Debug, Clone)]

pub struct Task {

   pub id: u64,
   pub title: String,
   pub status: Status,

} This is already far more expressive than a raw tuple or a set of integers. Just by reading the type definitions, you can see:

  • A task always has an ID
  • It always has a title
  • It must be in exactly one status
  • Each status may or may not carry additional data
  • Invalid states (e.g., “Done” without a timestamp) are impossible

There’s no way to “forget” to set a field or accidentally read the wrong one — Rust won’t let you.


Creating and Transitioning Tasks Let’s give this struct a few convenience methods to act on that state: impl Task {

   pub fn new(title: impl Into<String>) -> Self {
       Self {
           id: rand::random(),
           title: title.into(),
           status: Status::Todo,
       }
   }
   pub fn start(&mut self, assignee: Option<String>) {
       self.status = Status::InProgress { assignee };
   }
   pub fn mark_done(&mut self) {
       self.status = Status::Done {
           completed_at: SystemTime::now(),
       };
   }
   pub fn cancel(&mut self, reason: impl Into<String>) {
       self.status = Status::Canceled { reason: reason.into() };
   }

} Notice what we didn’t need:

  • manual tag management
  • “is_done” booleans scattered around
  • mismatched states
  • unsafe unions or void pointers

Every state transition is represented by a specific, typed variant. The compiler ensures that each state has exactly the data it requires. More specifically, creating and transitioning tasks in this design is all about modeling real-world workflow changes as precise, type-safe state transitions. When a task is first created, it is constructed in a fully valid state: it always has an ID, it always has a title, and it always begins in a known initial status (Todo). Because the constructor takes ownership of the title (converting a &str or String into an owned string), the Task owns all its data outright, and no external lifetimes need to be tracked. Transitioning a task—such as starting it, marking it done, or canceling it—means replacing the entire status enum with a new variant, not partially mutating fields or toggling booleans. This ensures that each state always carries exactly the data it needs: a task in progress may or may not have an assignee, a completed task must always have a completion timestamp, and a canceled task must always include a reason. Because Rust enums represent both the “tag” and the associated data together, there is never a moment where a task is in a contradictory or half-valid state (like “Done but with no timestamp” or “InProgress but with mismatched fields”). Mutating a task requires exclusive access (&mut self), which aligns naturally with the idea that you cannot observe or modify a task simultaneously from multiple places. Pattern matching later ensures that any code reading a task’s status must explicitly handle every possible variant, making it impossible to accidentally forget a case. In short, Rust turns task state transitions into a series of small, atomic, type-enforced steps where invalid states cannot exist.


Consuming Tasks with Pattern Matching Pattern matching lets us fully and safely inspect values: fn describe(task: &Task) {

   match &task.status {
       Status::Todo => println!("{} is still TODO.", task.title),
       Status::InProgress { assignee } => {
           match assignee {
               Some(name) => println!("{} is in progress by {name}.", task.title),
               None => println!("{} is in progress (unassigned).", task.title),
           }
       }
       Status::Done { completed_at } =>
           println!("{} completed at {:?}", task.title, completed_at),
       Status::Canceled { reason } =>
           println!("{} was canceled: {reason}", task.title),
   }

}


Integrating the Model into Our CLI (Command Line Interface) Let’s incorporate this into the CLI we started in Part 1. Reading arguments and applying transitions now looks much cleaner: Cargo.toml [package] name = "tasks_cli" version = "0.1.0" edition = "2021"

[dependencies]

  1. Used only to generate random task IDs.

rand = "0.8" Cargo.toml is the project’s manifest file: it tells Cargo (Rust’s build tool and package manager) what this application is and what it depends on. It declares the package name, version, and edition, which defines the language features and defaults the compiler should use. More importantly, it’s where we specify external “crates” like rand, which we use to generate random task IDs. Conceptually, Cargo.toml is the “wiring diagram” and API contract for your build: it defines how the source code is compiled into a binary, which versions of libraries are allowed, and how the project fits into the broader Rust ecosystem. Without touching any of your logic, it controls compilation behavior, dependency resolution, and how others can build or integrate your CLI app. src/main.rs // Bring in the task module (defined in task.rs). mod task;

use std::env; use task::Task; // Import the Task type for convenience.

fn main() {

   // Collect all CLI arguments into a Vec<String>.
   // Each String owns its heap data, meaning they all live on the heap
   // but the Vec holds the small pointer+len+cap header for each String on the stack.
   let args: Vec<String> = env::args().collect();
   // The program expects at least 1 required argument: the task title.
   if args.len() < 2 {
       eprintln!(
           "Usage: tasks <title> [--start <name>] [--done] [--cancel <reason>]

Examples:

   tasks \"Write article\"
   tasks \"Write article\" --start \"Alice\"
   tasks \"Write article\" --done
   tasks \"Write article\" --cancel \"Out of scope\"

"

       );
       return;
   }
   // The first user-provided argument becomes the task title.
   // We're borrowing &args[1] here; Task::new will clone the string into owned memory.
   let title = &args[1];
   // Create a new Task instance in the Todo state.
   // The ID is randomly generated, illustrating real-world object creation.
   let mut task = Task::new(title);
   // ---- State Transitions based on CLI arguments ----
   // If the user supplies `--start`, optionally followed by an assignee name.
   if let Some(i) = args.iter().position(|a| a == "--start") {
       // If a name follows `--start`, use it; otherwise, it's unassigned work.
       let assignee = args.get(i + 1).cloned();
       task.start(assignee);
   }
   // Mark the task done if `--done` appears anywhere.
   if args.iter().any(|a| a == "--done") {
       task.mark_done();
   }
   // Cancel the task if `--cancel <reason>` is provided.
   if let Some(i) = args.iter().position(|a| a == "--cancel") {
       // Borrow -> clone: we convert to owned String for internal storage.
       if let Some(reason) = args.get(i + 1) {
           task.cancel(reason.clone());
       }
   }
   // Print a human-readable description of the task,
   // showing how pattern matching works with enums.
   describe(&task);

}

/// Pretty-print a user-friendly summary about a task's state. /// Takes &Task (borrow only), avoiding any moves or clones. fn describe(task: &Task) {

   use task::Status;
   // Pattern match the enum — Rust enforces that ALL variants are handled.
   match &task.status {
       Status::Todo => {
           println!("Task #{}: \"{}\" is TODO.", task.id, task.title);
       }
       Status::InProgress { assignee } => {
           match assignee {
               Some(name) => println!(
                   "Task #{}: \"{}\" is IN PROGRESS by {}.",
                   task.id, task.title, name
               ),
               None => println!(
                   "Task #{}: \"{}\" is IN PROGRESS (unassigned).",
                   task.id, task.title
               ),
           }
       }
       Status::Done { completed_at } => {
           println!(
               "Task #{}: \"{}\" is DONE at {:?}.",
               task.id, task.title, completed_at
           );
       }
       Status::Canceled { reason } => {
           println!(
               "Task #{}: \"{}\" is CANCELED: {}",
               task.id, task.title, reason
           );
       }
   }
   // Also show the raw Debug view — helpful for learning how Rust formats enums & structs.
   println!("\nDebug view:\n{:#?}", task);

} main.rs is the entry point and glue between the outside world (the shell, user input) and your strongly typed domain model in task.rs. Its job is to parse command-line arguments, decide what the user is asking for, and then call into the Task methods to create or transition tasks accordingly. It doesn’t know or care how a task stores its state internally; it just uses the public API exposed by task.rs (like Task::new, start, mark_done, cancel). This file is where you handle messy, stringly-typed input—raw Strings from the environment—and convert it into clean, typed operations on your domain model. It also formats output for humans, using pattern matching to describe each task’s current state in a friendly way. Conceptually, main.rs is your app’s “front door and conductor”: it coordinates user intent, delegates real work to the domain layer, and prints results, while keeping all business logic and state modeling safely seperated in task.rs. src/task.rs use std::time::SystemTime;

/// The Status enum represents *exactly one* possible state of a task at any time. /// This is Rust’s “tagged union” (algebraic data type) in action: /// - A discriminant tells the compiler which variant is active. /// - Each variant may store different data shapes. /// - The compiler guarantees you only ever access the data that actually exists. /// - Invalid combinations (e.g. Done without a timestamp) are impossible. /// /// In other words: state and state-specific data travel together as a single value.

  1. [derive(Debug, Clone)]

pub enum Status {

   /// The task has not been started. This variant carries no associated data.
   Todo,
   /// The task is in progress. It may have an assigned user or be unassigned.
   /// `Option<String>` expresses this explicitly—no empty strings, no flags.
   InProgress {
       assignee: Option<String>,
   },
   /// The task is completed. This variant must always include a timestamp,
   /// which guarantees that “completed without time” simply cannot be expressed.
   Done {
       completed_at: SystemTime,
   },
   /// The task has been canceled and must include a human-readable reason.
   /// Again, the type system enforces that this data always exists.
   Canceled {
       reason: String,
   },

}

/// The Task struct models a real-world domain object with: /// - A stable identity (id) /// - A human-meaningful title /// - A safe, compiler-enforced state machine (status) /// /// Memory model notes: /// - `id` (u64) lives entirely on the stack. /// - `title` is a heap-allocated String (stack header + heap buffer). /// - `status` stores an enum whose layout is sized to fit its largest variant. /// Together, these form a compact, predictable in-memory representation.

  1. [derive(Debug, Clone)]

pub struct Task {

   /// Unique identifier. In a full app, this might come from a DB;
   /// for now it demonstrates fixed-size, stack-stored data.
   pub id: u64,
   /// Title is owned by the Task. The String header is on the stack,
   /// but the actual text is stored in the heap.
   pub title: String,
   /// Strongly typed, enforced-at-compile-time workflow state.
   pub status: Status,

}

impl Task {

   /// Construct a fully valid Task in the initial Todo state.
   ///
   /// `title.into()` accepts either &str or String and ensures that
   /// the Task owns its title—no external lifetimes required.
   pub fn new(title: impl Into<String>) -> Self {
       Self {
           id: rand::random(),     // Generates a u64; simple stand-in for a DB id.
           title: title.into(),    // Moves or allocates a String onto the heap.
           status: Status::Todo,   // A safe, explicit starting state.
       }
   }
   /// Transition the task into the InProgress state.
   /// Instead of mutating parts of the enum, we *replace* the entire variant at once.
   ///
   /// This ensures that:
   /// - The old state and its data are cleanly dropped.
   /// - The new state always has exactly the fields it needs.
   /// - No partial or contradictory states are ever observable.
   pub fn start(&mut self, assignee: Option<String>) {
       self.status = Status::InProgress { assignee };
   }
   /// Mark the task as completed, capturing the precise completion time.
   /// This enforces the invariant: a Done task always has a timestamp.
   pub fn mark_done(&mut self) {
       self.status = Status::Done {
           completed_at: SystemTime::now(),
       };
   }
   /// Cancel the task with a reason. `reason.into()` ensures that
   /// we always store an owned String payload with the variant.
   pub fn cancel(&mut self, reason: impl Into<String>) {
       self.status = Status::Canceled {
           reason: reason.into(),
       };
   }
   /// Convenience helper: check whether a task is completed.
   /// `matches!` inspects the enum discriminant without exposing the payload.
   pub fn is_done(&self) -> bool {
       matches!(self.status, Status::Done { .. })
   }

} task.rs is the heart of the application: it defines the domain model (Task) and the state machine (Status) that represent real-world work. This is where you describe what a task is and what states it can be in—Todo, InProgress, Done, Canceled—along with the extra data each state needs (like assignee, completion time, or cancel reason. By putting this logic in its own module, we separate “what a task means” from “how the CLI is wired,” making it easier to reuse the same model later. It also concentrates your invariants in one place: functions like new, start, mark_done, and cancel are the only ways to transition state, so they’re effectively your business rules. OUTPUT First command to run the application cleanly: cargo run -- "Write article"

Now let’s assign the task to one of our workers, say, Alice: cargo run -- "Write Article" --start Alice

When we want to indicate that the article is done being written we can send the following command to end our task: cargo run -- "Write article" --done

Now let’s start a new task and cancel it with a reason: cargo run -- "Write article" --cancel "Out of scope" This is the command we will run after we have started a new task with Alice.



Reflection Reflecting on what we’ve really done in this article, we learned how Rust lays out bits in memory with how you design real software. Stack vs heap, structs vs tuples, and enums as safe tagged unions are the foundation that lets you model a task’s lifecycle so that invalid states simply can’t exist. By moving from a simple echo CLI to a small and straightforward domain model, you’ve seen how ownership, alignment, and enum layouts work together to give you predictable performance and strong correctness guarantees without sacrificing ergonomics. From here on, every trait, generic, async function, or database layer you add will sit on top of this foundation: well-shaped types with clear invariants. In the next lesson, when we introduce traits and generics, we’ll be building behavior around these same Task and Status types, turning this tidy in-memory model into a flexible, testable core you can drive from a CLI, a web API, or any other interface.