A Kafka compatible Broker in Rust: Difference between revisions

Latest revision as of 07:52, 19 November 2025

Introducing Blink, an ultra-low-latency Kafka replacement*

I officially work on a product that performs real-time analysis of transactions, and one of the first things I noticed is the usage of Kafka as a push-pull adapter between data ingestion and processing, a legacy architectural choice from batch processing times. Kafka excels at what it was designed for: durable, distributed message streaming with strong consistency guarantees across multiple data centers. But in the pursuit of these enterprise-grade capabilities, Kafka necessarily carries architectural decisions that can constrain applications where ultra-low latency matters more than distributed durability. When you consider the needs of our platform of processing thousands of messages per second to update dashboards instantly, running within a single data center where network partitions are rare, power failures are managed by UPS systems, and the primary concern isn’t surviving datacenter outages but delivering the absolute lowest latency possible, Kafka’s distributed coordination overhead, disk-first persistence model, and complex cluster management become obstacles rather than features. The very mechanisms that make Kafka resilient in distributed environments introduce latency and operational complexity that are unnecessary for applications that prioritize speed over durability. This realization led us to ask a deceptively simple question: what would a message broker look like if it were built specifically for these high-performance, single-node scenarios while maintaining complete compatibility with the Kafka ecosystem? The answer is Blink, a message broker that provides a 1:1 replacement for Kafka in applications where latency trumps distributed durability. What’s Already Out There and Why Build Our Own The whole internal experiment started quite a few months ago, when coding agents were still in their infancy. I needed something that would allow us to intercept Kafka traffic first, at which point we realized that of all Kafka’s capabilities, we were using just a minuscule subset. So why not build our own REST-based bus? It required changing both producer and consumer side code, requiring the deployment of the entire pipeline, and in the presence of issues, it was hard to determine if problems were due to the switch to REST or the message bus itself. At that time, I was also unable to find anything that would come close to our envisioned needs; the only bootstrap was a library implementing the parsing of the Kafka wire protocol. Our Requirements I wanted something that would take advantage of the single consumer constraint of our architecture: consumers are very stateful with a strict one-thread-per-partition model to maximize cache locality. The real-time nature of our system means that messages are expected to be consumed at the same rate they are produced, except for very short processing bottlenecks (garbage collection or CPU-expensive messages). There’s no need for five-minute retention, a thirty-second-old message is already potentially garbage. These constraints shaped our design philosophy: build for the common case of smooth, low latency but perfectly balanced operation rather than optimizing for rare failure scenarios. Building Blink Building Blink required reimagining every layer of the messaging stack, from network protocol handling down to memory management patterns, while ensuring that existing Kafka clients, tools, and operational knowledge remain fully applicable. The engineering challenge wasn’t just about performance optimization. It was about creating a fundamentally different architecture while maintaining perfect protocol compatibility. Every Kafka client library, monitoring tool, and operational procedure had to work unchanged. This meant implementing the Kafka wire protocol and its nuances, while underneath, building an entirely different storage and concurrency model optimized for single-node, low-latency operation. To ensure wire protocol compatibility, Blink leverages the kafka-protocol crate, which provides Kafka protocol parsers code-generated directly from the official Kafka protocol JSON files. This approach guarantees that Blink speaks the exact same protocol as Apache Kafka, handling all the subtle versioning and compatibility nuances that have evolved over Kafka's development. Rather than manually implementing protocol parsing—a process prone to subtle bugs and version incompatibilities—the code-generated approach ensures perfect fidelity to the official specification. Zero-Change Kafka Compatibility By implementing the Kafka wire protocol, existing applications work immediately with no code changes: // Existing Kafka code works unchanged Properties props = new Properties(); props.put("bootstrap.servers", "localhost:9092"); // Just change the host KafkaProducer<String, String> producer = new KafkaProducer<>(props); This compatibility extends to all major Kafka client libraries across Java, Python, Go, Node.js, .NET, and Rust. Your existing monitoring tools, stream processing frameworks, and operational procedures continue to work without modification. Memory-First Storage Architecture At its core, Blink employs a memory-first storage strategy where messages live primarily in fast RAM, with an intelligent overflow mechanism that seamlessly transitions data to a write-ahead log when memory pressure builds. This hybrid approach delivers microsecond latencies for active data while gracefully handling datasets that exceed available memory — something that traditional in-memory solutions struggle with in production environments. The architecture implements a simple storage system powered by the nano-wal crate: Tier 1: Memory (Ultra-Fast, ~100µs latency)

   ↓ (Memory pressure at 80% threshold)

Tier 2: Write-Ahead Log (Fast, ~1ms latency)

   ↓ (Retention/Consumption)

Tier 3: Automatic Cleanup (Zero-Cost) Configuration is straightforward:

Memory tier configuration

max_memory: "2GB" # Offloadover at 80% = 1.6GB retention: "1h" # Time-based cleanup record_storage_path: "./wal" # WAL storage directory purge_on_fetch: false # Keep messages after consumption When memory usage exceeds 80% of the configured limit, Blink automatically offloads the oldest message batches to a write-ahead log. The nano-wal crate provides automatic segment management, crash recovery, and efficient I/O with optimized sequential writes and random reads. Consumers read from the WAL seamlessly when needed, adding only minimal latency (a few milliseconds) while maintaining system stability. Lock-Free Concurrent Architecture The concurrency model represents perhaps the most significant departure from traditional message broker architectures. Rather than relying on mutex-based synchronization that can create unpredictable latency spikes under contention, Blink employs lock-free data structures throughout its core operations. This approach eliminates the possibility of one slow consumer blocking fast producers, or high-throughput producers creating lock contention that affects consumer latency. The core storage engine uses concurrent hash maps with internal sharding: pub struct Storage {

   // No mutexes - pure concurrent data structures
   topic_partition_store: DashMap<PartitionKey, PartitionStorage>, // 256 internal shards

}

pub struct Broker {

   topics: DashMap<TopicName, TopicMeta>,  // Concurrent topic registry
   producers: AtomicI64,                    // Atomic counters
   storage: Storage,                        // Lock-free storage engine

} The DashMap from the dashmap crate uses 256 internal shards, dramatically reducing contention compared to a single-sharded concurrent map. Read operations are completely lock-free (around 10 nanoseconds), while write operations require minimal locking (around 100 nanoseconds) on tiny shards. This design enables linear scaling up to approximately 64 cores. Message routing and storage operations utilize these concurrent hash maps, allowing thousands of concurrent operations to proceed without coordination overhead. Producer-consumer notifications happen through atomic signaling mechanisms that wake waiting consumers the instant new data arrives: // Producer side - wake waiting consumers partition_storage.published.notify_waiters(); // Zero-cost notification

// Consumer side - efficient waiting partition_storage.published.notified().await; // No polling, no locks Request handling in Blink is completely async, with each connection running as an independent task rather than an OS thread: // Connection handling - no thread pools async fn handle_connection(stream: TcpStream, broker: Arc<Broker>) {

   loop {
       let request = read_kafka_request(&mut stream).await?;
       let response = broker.handle_request(request).await?; // No locks held
       write_response(&mut stream, response).await?;
   }

} This eliminates the context switching overhead of traditional thread-per-connection models and allows Blink to handle thousands of concurrent connections with minimal resource usage. Zero-Copy Message Handling Messages use Bytes with atomic reference counting instead of copying data: // Zero-copy message sharing let original_bytes = Bytes::from(request_data); let clone = original_bytes.clone(); // Just atomic increment, no memory copy This means multiple consumers can reference the same data without memory copies, improving both memory efficiency and cache performance. Throughout the entire pipeline — from request parsing to storage to response encoding — data remains in the same memory location. WebAssembly Plugin System Blink includes a very basic and experimental WebAssembly plugin system that enables runtime extensibility without sacrificing performance or security. Plugins can intercept and transform messages on the producer path, providing memory-safe sandboxing where plugins cannot crash the broker or access unauthorized resources. Plugins implement a simple WIT (WebAssembly Interface Types) interface: // Plugin implementation in Rust impl Guest for MyPlugin {

   fn init() -> Option<u16> {
       log(LogLevel::Info, "MyPlugin initialized");
       None // No special port requirements
   }
   
   fn on_record(topic: String, partition: u32, 
                key: Option<String>, value: String) -> bool {
       // Process each produced record
       if should_allow_message(&topic, &value) {
           true  // Allow message to be stored
       } else {
           false // Reject message
       }
   }

} Example use cases include content filtering, message enrichment, and topic-based routing: // Content filtering example fn on_record(topic: String, partition: u32,

            key: Option<String>, value: String) -> bool {
   // Block messages containing sensitive data
   if value.contains("SSN:") || value.contains("password") {
       log(LogLevel::Warn, "Blocked message with sensitive data");
       return false;
   }
   true

} WASM execution adds less than 10 microseconds of latency to message processing while providing complete isolation and type safety. Plugins can be loaded, unloaded, and updated without broker restarts, and they run in separate WASM memory spaces with controlled access to host functions. Kafka Compatibility in Detail The commitment to Kafka compatibility extends beyond just implementing the wire protocol. Blink supports the familiar concepts of topics and partitions, maintains standard offset semantics for reliable consumption ordering, and provides the same producer acknowledgment patterns that applications depend on. Consumer groups work as expected for coordinated consumption scenarios, though they’re implemented with an in-memory coordination model optimized for single-node performance rather than distributed consensus. This compatibility focus meant extensive testing against real Kafka client libraries across multiple programming languages. Every edge case in offset management, every nuance in batch handling, and every detail of error response formatting had to match Kafka’s behavior exactly. The goal was to enable teams to point their existing applications at Blink and observe dramatically improved performance without changing a single line of client code. Performance Characteristics Performance testing reveals the impact of these architectural decisions. Where Kafka might deliver 50–100 millisecond end-to-end latencies due to distributed coordination and disk I/O overhead, Blink consistently achieves 100–500 microsecond latencies for the same operations. Throughput scales linearly with CPU cores rather than being bounded by distributed consensus protocols or disk write performance. Perhaps more importantly, Blink boots to a ready state in milliseconds rather than the nearly minute required for Kafka broker initialization. Memory-only mode starts in approximately 10 milliseconds, while even with WAL recovery, boot time is around 50 milliseconds regardless of dataset size. This characteristic transforms development workflows, testing procedures, and deployment strategies. Integration tests can spin up a message broker instance as easily as starting a database connection, enabling faster feedback cycles and more comprehensive testing strategies. Operational Simplicity The operational model reflects this philosophy of simplicity. Blink deploys as a single binary with configuration managed through a straightforward YAML file or environment variables:

Resource limits

max_memory: "1GB" retention: "5m"

Network configuration

broker_ports: [9092, 9094] rest_port: 30004 kafka_hostname: "0.0.0.0"

Optional features

enable_consumer_groups: false plugin_paths: [] There are no ensemble coordination protocols to configure, no ZooKeeper dependencies to manage, and no complex replication topologies to understand. Monitoring happens through standard Prometheus metrics exposed at http://localhost:30004/prometheus and a built-in REST API that provides real-time visibility into performance characteristics and system health. Key metrics include:

blink_produce_count / blink_fetch_count - Request counters
blink_produce_duration / blink_fetch_duration - Latency distributions
blink_in_memory_queue_size_bytes - Current memory usage
blink_offloaded_batch_count - Batches on disk
blink_current_allocated - Real-time memory allocation

The REST API provides additional debugging and operational endpoints:

Status and metrics

curl http://localhost:30004/ curl http://localhost:30004/version

Debugging

curl http://localhost:30004/dump # Storage state dump curl http://localhost:30004/allocated # Memory usage

Manual operations

curl -X POST http://localhost:30004/produce -d "topic=test&value=data" curl http://localhost:30004/fetch?topic=test&offset=0 curl http://localhost:30004/purge # Force cleanup Getting Started Getting started with Blink is straightforward:

Clone and build

git clone https://github.com/cleafy/blink.git cd blink cargo build --release

Run with default configuration

./target/release/blink

Or with custom settings

./target/release/blink --settings custom-settings.yaml Once running, you can use any Kafka client library without modifications:

Python example - no changes from Kafka

from kafka import KafkaProducer, KafkaConsumer

Producer

producer = KafkaProducer(bootstrap_servers=['localhost:9092']) producer.send('test-topic', b'Hello Blink!')

Consumer

consumer = KafkaConsumer('test-topic', bootstrap_servers=['localhost:9092']) for message in consumer:

   print(message.value)

Docker deployment is equally simple: docker run -p 9092:9092 -p 30004:30004 \

 -e MAX_MEMORY=1GB \
 -e RETENTION=30s \
 blink

When Blink Makes Sense For teams building applications where message durability requirements are measured in minutes or hours rather than years, where system boundaries are typically single data centers rather than global distributions, and where latency requirements are measured in microseconds rather than milliseconds, Blink provides a path to leverage the Kafka ecosystem’s maturity while achieving performance characteristics that traditional Kafka deployments simply cannot deliver. Blink is perfect for:

Financial trading systems requiring sub-millisecond latency
Real-time analytics with high-volume event streams
Microservices communication within a single datacenter
Development and testing environments
Live dashboards requiring real-time data updates
Single consumer setups allowing consumed record eviction

Blink is not suitable for:

Multi-datacenter replication requirements
Long-term data retention (months/years)
Regulatory compliance requiring guaranteed durability
Complex consumer group implementations

Looking Forward The message broker landscape has long been defined by the choice between performance and features, or between simplicity and scalability. Blink demonstrates that when requirements are clearly understood and architectural constraints are embraced rather than avoided, it’s possible to deliver both exceptional performance and operational simplicity while remaining compatible with existing tools and practices. For teams currently struggling with Kafka’s latency characteristics in high-performance applications, or those avoiding message brokers entirely due to operational complexity concerns, Blink provides a compelling alternative that doesn’t require abandoning the broader Kafka ecosystem or rewriting existing applications. Sometimes the best solution isn’t to make existing tools faster — it’s to build new tools that are optimized for the specific problems you’re trying to solve. Blink is available as open source software and ready for use in applications where its architectural trade-offs align with system requirements. You can find the project on GitHub at https://github.com/cleafy/blink.

Read the full article here: https://medium.com/rustaceans/a-kafka-compatible-broker-in-rust-6ef6a36d871d