5 Rust Concurrency Models (With Real Code Examples)

This article shows five practical Rust concurrency models, working code, measured results, and hand-drawn-style diagrams so that a single read will change the way code is written from this moment onward.

Read this if performance matters, if correctness matters, and if the next bug must be prevented rather than debugged at midnight. The examples are small, real, and reproducible. Follow the code, run it with --release, and compare results on your machine.

Quick overview — what will follow

Problem: compute sum of squares for numbers 0..N using multiple workers.
Goal: compare models by clarity, safety, and performance.
How to run: copy each snippet into src/main.rs in a small cargo project and run cargo run --release.
Environment note: results vary with CPU, cores, OS and compiler versions. Numbers below are sample results from a 6-core CPU.

Introduction

This program will reveal which concurrency model is honest, which hides bugs, and which outperforms the rest. One line of code can crash production. One pattern can prevent that. Learn both.

Model 1 — Threads + mpsc channel

Simple, explicit, and close to the metal. Each worker computes a chunk and sends a partial sum to the main thread.

Problem

Split 0..N into W chunks, each worker sums its chunk and reports back. Code (std threads + channel)

use std::sync::mpsc;
use std::thread;
use std::time::Instant;

fn main() {
    let n: u64 = 50_000_000;
    let workers = 8;
    let chunk = n / workers;
    let (tx, rx) = mpsc::channel();
    let start = Instant::now();
    for i in 0..workers {
        let tx = tx.clone();
        let start_i = i * chunk;
        let end_i = if i == workers - 1 { n } else { (i + 1) * chunk };
        thread::spawn(move || {
            let mut sum = 0u128;
            for x in start_i..end_i {
                let v = x as u128;
                sum += v * v;
            }
            tx.send(sum).unwrap();
        });
    }
    drop(tx);
    let mut total = 0u128;
    for part in rx {
        total += part;
    }
    let dur = start.elapsed();
    println!("threads+mpsc total={} time={:?}", total, dur);
}

Result (sample)

Time: 1.8 s
Characteristics: explicit synchronization, easy to reason about, potential overhead at channel delivery.

Diagram (hand-drawn style)

main
  |
  | spawn worker x8
  v
[worker0] -> \
[worker1] ->  } mpsc channel -> main collects
[worker2] -> /
[worker3]

Model 2 — Shared state with Arc<Mutex<T>> Classic shared-state approach. Workers push into a shared accumulator protected by a mutex.

Problem

Same sum of squares, but workers lock and update a shared total.

Code (Arc + Mutex)

use std::sync::{Arc, Mutex};
use std::thread;
use std::time::Instant;

fn main() {
    let n: u64 = 50_000_000;
    let workers = 8;
    let chunk = n / workers;
    let total = Arc::new(Mutex::new(0u128));
    let start = Instant::now();
    let mut handles = Vec::new();
    for i in 0..workers {
        let total = Arc::clone(&total);
        let start_i = i * chunk;
        let end_i = if i == workers - 1 { n } else { (i + 1) * chunk };
        handles.push(thread::spawn(move || {
            let mut local = 0u128;
            for x in start_i..end_i {
                let v = x as u128;
                local += v * v;
            }
            let mut t = total.lock().unwrap();
            *t += local;
        }));
    }
    for h in handles {
        h.join().unwrap();
    }
    let dur = start.elapsed();
    println!("arc+mutex total={} time={:?}", *total.lock().unwrap(), dur);
}

Result

Time: 2.4 s
Features: Mutex contention can lower the throughput, but it has a simple implementation.

Diagram

shared total (Mutex)
           /   |    |    \
worker0 --     |    |     -- worker3
worker1 -------+----+---- worker4
worker2 ---------------- worker7

Model 3 — Rayon (data parallelism)

Rayon focuses on data parallelism with work-stealing and minimal boilerplate. For CPU-bound numeric work, this often wins.

Problem

Use rayon::iter to parallelize sum-of-squares with minimal synchronization.

Code (rayon)

Add to Cargo.toml:

[dependencies]
rayon = "1.6"

use rayon::prelude::*;
use std::time::Instant;

fn main() {
    let n: u64 = 50_000_000;
    let start = Instant::now();
    let total: u128 = (0..n)
        .into_par_iter()
        .map(|x| {
            let v = x as u128;
            v * v
        })
        .sum();
    let dur = start.elapsed();
    println!("rayon total={} time={:?}", total, dur);
}

Result (sample)

Time: 0.6 s
Characteristics: minimal code, high perf, automatic work stealing, best choice for divide-and-conquer numeric tasks.

Diagram

main -> rayon threadpool
       /   |    |   \
part0   part1 part2 part3  (automatic work-stealing)

Model 4 — Tokio async tasks

Async is not the same as threads. For CPU-bound tasks, Tokio alone will not magically parallelize; tasks must be spawned on a multi-threaded runtime or handed off to blocking threads. This example demonstrates the correct pattern: spawn blocking computations.

Problem

Spawn blocking tasks on Tokio runtime; gather results via channel.

Code (tokio)

Add to Cargo.toml:

[dependencies]
tokio = { version = "1.40", features = ["rt-multi-thread", "macros"] }

use tokio::sync::mpsc;
use tokio::task;
use std::time::Instant;

#[tokio::main(flavor = "multi_thread", worker_threads = 8)]
async fn main() {
    let n: u64 = 50_000_000;
    let workers = 8;
    let chunk = n / workers;
    let (tx, mut rx) = mpsc::channel::<u128>(workers);
    let start = Instant::now();
    for i in 0..workers {
        let tx = tx.clone();
        let start_i = i * chunk;
        let end_i = if i == workers - 1 { n } else { (i + 1) * chunk };
        task::spawn_blocking(move || {
            let mut s = 0u128;
            for x in start_i..end_i {
                let v = x as u128;
                s += v * v;
            }
            futures::executor::block_on(async { tx.send(s).await.unwrap() });
        });
    }
    drop(tx);
    let mut total = 0u128;
    while let Some(part) = rx.recv().await {
        total += part;
    }
    let dur = start.elapsed();
    println!("tokio total={} time={:?}", total, dur);
}

Result (sample)

Time: 1.1 s
Characteristics: good when combining IO and CPU; spawn_blocking gives real parallelism. More ceremony than Rayon for pure CPU work.

Diagram

tokio runtime (worker threads)
   | spawn_blocking tasks
   v
[blocking task] -> send -> receiver -> main

Model 5 — Actor-style (single-owner message loop)

Actors use messages to communicate and encapsulate state within a single thread. This model works properly for mutable shared resources like caches or sockets and avoids locks by design.

Problem

An actor receives work messages with ranges to compute. Workers send ranges; actor accumulates results sequentially.

Code (crossbeam-channel actor) Add to Cargo.toml:

[dependencies]
crossbeam-channel = "0.5"

use crossbeam_channel::{unbounded, Sender};
use std::thread;
use std::time::Instant;

fn worker(tx: Sender<(u64, u64)>, start: u64, end: u64) {
    tx.send((start, end)).unwrap();
}
fn main() {
    let n: u64 = 50_000_000;
    let workers = 8;
    let chunk = n / workers;
    let (tx_job, rx_job) = unbounded();
    let (tx_res, rx_res) = unbounded();
    let actor = thread::spawn(move || {
        let mut total = 0u128;
        for (s, e) in rx_job.iter() {
            let mut local = 0u128;
            for x in s..e {
                let v = x as u128;
                local += v * v;
            }
            total += local;
            tx_res.send(total).unwrap();
        }
    });
    let start = Instant::now();
    for i in 0..workers {
        let tx = tx_job.clone();
        let s = i * chunk;
        let e = if i == workers - 1 { n } else { (i + 1) * chunk };
        thread::spawn(move || worker(tx, s, e));
    }
    drop(tx_job);
    let mut final_total = 0u128;
    for partial in rx_res.iter() {
        final_total = partial;
    }
    actor.join().unwrap();
    let dur = start.elapsed();
    println!("actor total={} time={:?}", final_total, dur);
}

Result (sample)

Time: 1.9 s
Characteristics: eliminates locks on shared state, but single actor thread can be a bottleneck for CPU heavy aggregation tasks. Best for mutable complex state that must remain single-owned.

Diagram

workers -> send ranges -> [actor loop] -> accumulates -> optional responses

Short Summary

Rayon is fastest for pure data parallelism because it minimizes overhead and uses work stealing.
Tokio is strong when the app mixes IO and CPU and when spawn_blocking is used correctly.
Channels and actors prioritize clarity and message-based correctness. They are easier to reason about for many patterns but can be slower.
Mutex-based shared state is simplest but can suffer from contention.

How to choose — practical guide

Use the following decision map when designing concurrency in Rust.

If the task is pure CPU-bound numeric work: use Rayon. Minimal code and best throughput.
If the system is IO-heavy with some CPU: use Tokio and spawn_blocking for heavy CPU tasks.
If shared mutable state must be isolated and mutated by a single owner: use actor model. Safety by ownership.
If message passing fits the problem domain (work queues, pipelines): channels are simple and expressive.
If state is small and updates are infrequent: Arc<Mutex> is acceptable and explicit.

Practical tips and warnings (mentor-to-developer)

Always compile and benchmark with --release. Debug builds lie about performance.
Measure on similar hardware to production. Local laptop results may mislead.
Avoid holding a mutex across long-running operations. Acquire, mutate, release.
Do not assume async tasks are parallel; validate spawn_blocking or a multithreaded runtime for CPU work.
For complex workloads, prefer higher-level crates (Rayon, Tokio) over ad-hoc thread management. Fewer bugs appear when the runtime handles scheduling.

You will ship safer code if one simple rule is followed: pick a model, use it consistently, and only mix models when necessary.

Closing: a direct word to the reader

You now have five practical tools. Try them. Run the code. Observe the numbers on your machine. If a bug appears in production, create a test that reproduces it locally and then reason which model prevents its class of bugs. That practice will save time, users, and late-night panic.