8 Qdrant + Rust Setups for Low-Drift Recall

Eight Qdrant + Rust setups — versioned collections, HNSW tuning, hybrid scores, filters, freshness, safe writes, dedupe, and recall audits — to reduce retrieval drift.

Your search was great last month. Then embeddings changed, traffic spiked in one region, and suddenly “relevant” feels random. That’s recall drift — a slow slide from crisp to muddy. The good news: you can fight drift with a few boring, durable setups in Qdrant, wired from a lean Rust service.

Below are eight patterns I keep returning to when I need recall that stays put.

1) Version Your Vectors (and Guard at Query Time)

Why: Mixing embeddings from different models corrupts neighborhoods. Keep versions explicit.

How:

One collection per embedding family (e.g., docs_v3), or add a model_version payload and filter on it.
Store the model name + version with every point. Reject queries that don’t specify a version.

use qdrant_client::{prelude::*, qdrant::PointStruct}; use serde_json::json;

let mut client = QdrantClient::from_url("http://localhost:6334")?.build()?;

// Create collection with a named vector client.create_collection(&CreateCollection {

   collection_name: "docs_v3".into(),
   vectors_config: Some(VectorsConfig::from_params(1_536)?), // dimension
   hnsw_config: Some(HnswConfigDiff { m: Some(32), ef_construct: Some(200), ..Default::default() }),
   ..Default::default()

}).await?;

let vec: Vec<f32> = embed_v3("the text"); let payload = json!({"model_version":"v3.1","lang":"en","tenant":"acme"}); let point = PointStruct::new(123.into(), vec, payload); client.upsert_points("docs_v3", vec![point], None).await?;

// Query MUST filter by model_version to avoid drift let filter = Filter::must([Condition::matches("model_version", "v3.1")]); let result = client.search_points(&SearchPoints {

   collection_name: "docs_v3".into(),
   vector: embed_v3("query"),
   limit: 20,
   with_payload: Some(true.into()),
   filter: Some(filter),
   params: Some(SearchParams { hnsw_ef: Some(128), exact: Some(false), ..Default::default() }),
   ..Default::default()

}).await?;

Rule: Never mix versions “just for a week.” That week becomes your baseline forever.

2) Tune HNSW for Stability, Not Just Speed

Why: HNSW settings (m, ef_construct, ef_search) strongly affect recall variance under load.

How:

For write-heavy collections, start with m=32, ef_construct=200.
At query, set hnsw_ef high enough to keep p95 recall steady (common: ef ≈ 2–4× k).

let params = SearchParams { hnsw_ef: Some(128), exact: Some(false), ..Default::default() }; // k=50 → ef 100–200 is a good starting window

Practice: Sweep ef offline with a fixed eval set and lock it. Chasing every p99 blip with a new ef is drift disguised as tuning.

3) Filter First: Narrow the Neighborhood

Why: Tenants, locales, or content types that don’t matter to a query just add noise.

How:

Index payload fields you routinely filter on (tenant, lang, doc_type).
Apply a must filter at search time to avoid cross-tenant contamination.

let filter = Filter::new_must(vec![

   Condition::matches("tenant", "acme"),
   Condition::matches("lang", "en"),
   Condition::matches("doc_type", "howto"),

]); // smaller candidate set → more stable recall

Note: Filtering reduces candidate diversity in a good way. It’s not “lost recall” if those points were never valid results.

4) Hybrid Scoring: Dense + Sparse Keeps Meaning Intact

Why: Pure dense can wander when synonyms or OOD terms dominate. Adding a sparse signal (BM25 or SPLADE) anchors intent.

How (conceptual):

Store a sparse vector (or just top-k token weights) alongside the dense vector.
Search both; fuse via simple Reciprocal Rank Fusion (RRF) or min-max normalized scores.

fn fuse_rrf(mut dense: Vec<ScoredPoint>, mut sparse: Vec<ScoredPoint>) -> Vec<ScoredPoint> {

   use std::collections::HashMap;
   let mut r: HashMap<u64, f32> = HashMap::new();
   for (i, p) in dense.iter().enumerate() {
       r.entry(p.id.as_u64()).and_modify(|v| *v += 1.0/(60.0+i as f32)).or_insert(1.0/(60.0+i as f32));
   }
   for (i, p) in sparse.iter().enumerate() {
       r.entry(p.id.as_u64()).and_modify(|v| *v += 1.0/(60.0+i as f32)).or_insert(1.0/(60.0+i as f32));
   }
   // rank by fused score
   // (fetch payloads if needed)
   // return top-k
   // ...
   Vec::new()

}

Result: Queries with rare words or code identifiers get “rescued” by sparse overlap instead of drifting to vaguely similar paragraphs.

5) Freshness Without Forgetfulness (Score Decay, Not Hard Cutoffs)

Why: Naive “last 30 days” filters torpedo recall on evergreen content. Prefer soft decay. How:

Keep a ts payload.
Post-process top-N by multiplying the distance score with a time decay factor (e.g., half-life 30 days). You preserve relevant classics but steadily prefer fresh.

fn time_decay(now_ms: i64, ts_ms: i64, half_life_days: f32) -> f32 {

   let dt_days = (now_ms - ts_ms) as f32 / (86_400_000.0);
   (0.5f32).powf(dt_days / half_life_days)

}

// after initial search: for p in results.iter_mut() {

   let ts = p.payload.as_ref().and_then(|pl| pl.get("ts")?.as_i64()).unwrap_or(0);
   let decay = time_decay(now_ms, ts, 30.0);
   p.score = p.score * decay; // or combine with normalized similarity

}

Why it reduces drift: You don’t “forget” older high-quality points; you just resist staleness creep.

6) Write Path That Can’t Lie (Ordered Upserts + wait=true)

Why: Out-of-order writes and async indexing make neighborhoods inconsistent for minutes — perceived as recall drift. How:

Shard deterministically (e.g., by tenant) on the app side to reduce cross-shard contention.
Batch upserts and set wait=true so the search reflects the write before you acknowledge.

let op = client.upsert_points_blocking("docs_v3", vec![point]).await?; // waits for index

For updates that change payload filters (e.g., lang), delete-then-insert instead of partial merges to keep filters coherent.

Let’s be real: A 30–80 ms write penalty beats hours of “why doesn’t it show up?” debugging.

7) Dedupe and Outlier Hygiene

Why: Near-duplicates inflate certain neighborhoods and push out diverse, relevant points; outliers add random hops. How:

On ingest, locality-check new vectors:

let probe = client.search_points(&SearchPoints {

 collection_name: "docs_v3".into(),
 vector: new_vec.clone(),
 limit: 3,
 filter: Some(Filter::must([Condition::matches("tenant","acme")])),
 ..Default::default()

}).await?;

let too_close = probe.result.iter().any(|p| p.score > 0.98); if too_close { /* merge or skip */ }

Periodically prune points with abnormally small/large norms or those with no reciprocal neighbors (isolation heuristic) and re-embed if needed.

Outcome: Tighter, healthier neighborhoods; fewer “why is this spammy blurb everywhere?” complaints.

8) A Quiet, Ruthless Recall Audit

Why: You can’t control drift you don’t measure. Keep a stable golden set of queries + expected ids/snippets. How:

Store ~100–300 queries per domain, tagged by intent (faq, how-to, error).
Nightly job (Rust) computes Recall@k and nDCG versus your golden answers, by segment and model version.

struct Golden { q: String, expected: Vec<u64> }

fn recall_at_k(actual: &[u64], expected: &[u64], k: usize) -> f32 {

   let hit = actual.iter().take(k).any(|id| expected.contains(id));
   if hit { 1.0 } else { 0.0 }

}

async fn audit(client: &QdrantClient, gold: &[Golden]) -> anyhow::Result<()> {

   let mut total = 0.0;
   for g in gold {
       let res = client.search_points(&SearchPoints {
           collection_name: "docs_v3".into(),
           vector: embed_v3(&g.q),
           limit: 10, ..Default::default()
       }).await?;
       let actual: Vec<u64> = res.result.iter().map(|p| p.id.as_u64()).collect();
       total += recall_at_k(&actual, &g.expected, 5);
   }
   let avg = total / (gold.len() as f32);
   println!("Recall@5 = {:.3}", avg);
   Ok(())

}

Alert on deltas, not absolutes (e.g., “Recall@5 −0.08 week-over-week”). Then trace to a setup above (new model? EF change? Noisy tenant?).

Bonus: Log the embedding hash with each query to tie regressions to model drift quickly.

Tiny Case Study: “Support Search That Stopped Going Sideways” A support portal mixed v2 and v3 embeddings for a week “to ease the migration.” Results got weirder by the day. Fix:

Versioned collection + query-time filter (Setup #1).
HNSW sweep and locked ef=160 (Setup #2).
Tenant/lang filters (Setup #3).
Added sparse fusion for rare error codes (Setup #4).
Nightly recall audit with 120 gold queries (Setup #8).

Outcome: Recall@5 up 14 points, tail stabilized, and migrations no longer “feel” risky.

“Pick Your First Three” Cheat Sheet

Immediate leaks: Versioned collections + filters (#1, #3)
Stability under load: Tune HNSW and lock ef (#2)
Weird rare-term queries: Hybrid dense+sparse (#4)
Ops discipline: wait=true upserts + nightly recall audit (#6, #8)

Conclusion

Low-drift recall isn’t a one-time tune; it’s a set of habits. Keep vectors versioned and filtered, tune HNSW once and stop fiddling, fuse sparse signals for meaning, decay by time without amnesia, write safely, clean the neighborhood, and measure recall relentlessly. Do that, and your Qdrant + Rust stack will feel boring — in the best possible way.

Read the full article here: https://medium.com/@kaushalsinh73/8-qdrant-rust-setups-for-low-drift-recall-5d22031e29a0