Building KITT: Kafka Implementation Throughput Tool, the Knight Rider Way

From tangled rate discovery to an elegant, self-balancing system, this is the story of how a Kafka benchmarking tool matured into a reliable and insightful performance probe.

The Problem to Solve

The core objective was straightforward yet technically nuanced: determine the maximum sustainable throughput a Kafka broker can maintain without accumulating an unmanageable backlog. The goal extended beyond raw throughput numbers — it also encompassed understanding how Kafka handles pressure, how latency builds up, and at what point consumers can no longer keep up. To accomplish this, a tool was needed that could simulate real-world production loads while intelligently backing off before overwhelming the system. Several general-purpose tools already existed, such as the OpenMessaging Benchmark and similar frameworks. However, these tools were often designed for broader performance testing across multiple messaging systems and included layers of abstraction, configuration complexity, and assumptions not aligned with the narrow, low-level behavior being targeted. The effort required to adapt these existing systems to Kafka-specific, fine-grained, backlog-driven measurement appeared greater than developing a tool from scratch — especially when assisted by lightweight agents capable of directly interfacing with Kafka protocol operations. The tool was named KITT, drawing inspiration from the intelligent car in Knight Rider. This metaphor represented the tool’s purpose well: a vigilant, responsive system that could scan Kafka’s performance dynamics and respond with agility.

First Attempt: Complex Rate Discovery

The initial implementation employed a structured, multi-phase rate discovery algorithm:

Begin at a base rate of 1000 messages per second, with linear increments.
Continuously monitor backlog size, aiming to maintain a buffer no greater than 10 percent of the current rate.
If backlog exceeded acceptable levels, transition into a fine-tuning phase.
Tear down the current test topic and recreate it to eliminate any historical skew.
Use smaller rate adjustment increments to home in on the system’s stable limits.
Employ semaphore-based concurrency control mechanisms and introduce background permit releasers.

This model was built on a solid theoretical foundation. However, it proved burdensome in practice. The implementation ballooned to over 1000 lines of code. State machines grew tangled. Asynchronous timing logic and thread coordination became sources of subtle bugs and non-deterministic behavior. Overhead from managing fine-tuning phases, topic resets, and control threads made the system difficult to extend or validate.

Breaking Free from rd-kafka: Building a Custom Kafka Client

The discovery of version negotiation issues led to broader architectural questions. The use of rd-kafka posed limitations that did not align with the intended control model. Specifically, rd-kafka enforced consumer group membership, automatic partition assignment, and opaque buffering. These mechanisms were helpful for typical consumer use but made throughput measurement inflexible. KITT required complete control: direct partition access, manual offset tracking, and simplified client state. Rather than bending the library to fit the model, a custom Kafka client was implemented. This client was minimal by design, focused on protocol correctness, transport management, and API negotiation. The result was a tailored interface: one that exposed produce and fetch requests, created topics on demand, and spoke to the broker without mediation.

Among the key advantages we discovered were several that shaped the system’s overall design. First, dynamic version negotiation allowed seamless API compatibility across different client and server versions, eliminating brittle assumptions. We also bypassed the complexity of consumer groups by enabling direct access to partitions, giving clients finer control over data flow. By avoiding background buffering, the system kept its memory usage lean — an essential trait in constrained environments. Finally, the clarity of explicit packet exchanges significantly improved both testability and traceability, making it easier to debug and reason about system behavior end-to-end.

In practice, this custom client was more stable, easier to reason about, and required under 200 lines of core logic.

Rate Limiting Evolution

Initial rate limiting logic used per-message microsecond-level delays. A calculation derived the interval for each message, which was then enforced with an asynchronous sleep:

let interval = (1_000_000.0 / rate) as u64; tokio::time::sleep(Duration::from_micros(interval)).await;

While conceptually simple, this approach was flawed. The asynchronous runtime’s resolution was not fine-grained enough to provide consistent behavior at high message rates. As a result, actual throughput was significantly below target values. In trials targeting 2000 messages per second, actual rates hovered around 1000.

To address these limitations, batch-based rate limiting was introduced:

if messages_in_batch >= BATCH_SIZE {
    let elapsed = batch_start.elapsed();
    let expected = Duration::from_secs_f64(BATCH_SIZE as f64 / rate);
    if elapsed < expected {
        tokio::time::sleep(expected - elapsed).await;
    }
}

This method amortized sleep overhead over a batch of messages, dramatically improving timing accuracy. Batch-based control allowed the runtime to schedule delays more effectively.

Later iterations experimented with semaphore-based flow control. In this model, message production was throttled via permits that were replenished in the background based on throughput and backlog metrics. While theoretically elegant, this model proved too sensitive to permit mismanagement, resulting in either starvation or uncontrolled backlog growth.

Threading and Rate Division

In an effort to scale horizontally, multi-threaded support was added. Initial results were unexpectedly poor. Latency spiked and consumer lag increased despite nominal increases in thread count. The underlying bug: each producer thread was targeting the full global rate, rather than dividing the workload evenly. This mistake compounded load exponentially. A four-thread configuration targeting 2000 messages per second per thread attempted to push 8000 messages per second, far beyond capacity. The solution was to compute a per-thread rate: let per_thread_rate = global_rate / thread_count as f64; Once fixed, throughput scaled linearly and system stability returned.

Final Solution: Backlog-Based Flow Control

The turning point was a shift from predictive control to reactive feedback. Instead of relying on precomputed intervals, the system monitored its own backlog and adjusted behavior accordingly.

let backlog = sent.saturating_sub(received);
if backlog > MAX_BACKLOG {
    tokio::time::sleep(Duration::from_millis(10)).await;
    continue;
}

This approach removed complexity. No semaphores. No fine-tuning phases. No topic resets. The system naturally reached equilibrium, throttling when needed and accelerating when possible. It became adaptive, responsive, and — most importantly — accurate in identifying broker limits.

Adding Visual Feedback: Knight Rider Animation

A key usability enhancement was the addition of a terminal-based scanner animation. Drawing from the Knight Rider theme, the animation moved a bright red bar across the terminal to represent throughput momentum.

[      █▓▒░          ] 3021 msg/s (min: 2650, max: 3021)
[         █▓▒░       ] 2956 msg/s (min: 2650, max: 3021)

The scanner reflected live throughput, updated in real time. It also tracked the minimum and maximum observed rates, giving users a sense of fluctuation and stability over time. This feature elevated KITT beyond a technical utility — it became interactive, expressive, and intuitive.

Refactoring Toward Clean Architecture

With functionality stabilized, architectural improvements followed. The Kafka protocol logic had grown into a tangle of parsing, encoding, and flow control. Extracting this into a standalone module named KafkaClient improved clarity and testability. The restructured project:

kitt/
├── src/
│   ├── main.rs              // Entry point and orchestration
│   ├── kafka_client.rs      // Encapsulated Kafka protocol handling

Separation of concerns enabled easier testing, better documentation, and reduced coupling. Future enhancements, such as authentication or TLS support, were now easier to integrate.

Reflections and Final Architecture

Building this system was as much a journey in engineering as it was in design philosophy. Early prototypes reminded us of a timeless lesson: complexity isn’t always the enemy, but simplicity often wins. In the face of unpredictable scenarios and edge cases, we found that the most resilient strategies were often the ones grounded in clear, minimal assumptions.

One such insight came through the integration of Kafka API version negotiation. While easy to overlook in controlled environments, it quickly became evident that real-world deployments demand flexibility. By enabling dynamic compatibility with brokers of varying versions, the system remained robust and operable across diverse setups, reducing friction and deployment risk.

Flow control, too, underwent significant evolution. Rather than enforcing a fixed message rate, we implemented a reactive, backlog-aware throttling mechanism. This approach allowed the system to adapt fluidly to changes in throughput and processing capacity, outperforming static rate limits in both stability and realism.

From a user’s perspective, the tool was never just a backend system — it needed to be tangible and informative. Inspired by Knight Rider, we embedded a live, animated dashboard that visualized key metrics and historical performance bounds. This not only added flair, but improved usability and confidence in the system’s behavior. Thoughtful interface design, we discovered, plays a vital role in tool adoption and effectiveness.

Architecturally, we kept things modular and focused. Protocol-level responsibilities — such as message formatting, API logic, and negotiation — were kept distinct from the orchestration code responsible for producing workloads and collecting telemetry. This clear separation of concerns made the system easier to reason about, extend, and test.

Execution-wise, the system employs a multi-threaded producer/consumer model, designed for scalability and realism. By distributing load generation and consumption across multiple threads, we mirrored the behavior of real Kafka clients and systems under stress.

In the end, the architecture not only met its technical goals but also reflected the broader lessons we had gathered: embrace simplicity, react to feedback, design for real-world variance, and never underestimate the value of clarity — in both code and experience.

How to run KITT

KITT is available at this repo, feel free to adapt it to your needs. To run it, just simply do: git clone https://github.com/aovestdipaperino/kitt.git && cd kitt cargo build --release ./target/release/kitt --broker <broker-address>:9092 --partitions 4

The animated terminal scanner reflects real-time throughput and system behavior. KITT evolves from a benchmarking script into an intelligent assistant for evaluating Kafka performance. Built on practical insights and refined through iteration, it combines performance rigor with clarity of feedback. One thing I forgot to mention earlier: since rd-kafka, based on the official C library, doesn’t support all the features of the native Java client, I took the freedom to implement just enough protocol using the kafka-protocol crate, which will be material for another topic.

Craving more deep dives into Rust, elegant design patterns, and performance that sings?

Read the full article here: https://medium.com/rustaceans/building-kitt-kafka-implementation-throughput-tool-the-knight-rider-way-09c0e2f8a4ad