<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
	<id>https://johnwick.cc/index.php?action=history&amp;feed=atom&amp;title=Building_Smarter_AI_Systems_with_Vector_Databases</id>
	<title>Building Smarter AI Systems with Vector Databases - Revision history</title>
	<link rel="self" type="application/atom+xml" href="https://johnwick.cc/index.php?action=history&amp;feed=atom&amp;title=Building_Smarter_AI_Systems_with_Vector_Databases"/>
	<link rel="alternate" type="text/html" href="https://johnwick.cc/index.php?title=Building_Smarter_AI_Systems_with_Vector_Databases&amp;action=history"/>
	<updated>2026-05-06T17:35:33Z</updated>
	<subtitle>Revision history for this page on the wiki</subtitle>
	<generator>MediaWiki 1.44.1</generator>
	<entry>
		<id>https://johnwick.cc/index.php?title=Building_Smarter_AI_Systems_with_Vector_Databases&amp;diff=1659&amp;oldid=prev</id>
		<title>PC: Created page with &quot;How I used embeddings, similarity search, and retrieval pipelines to build context-aware AI that actually remembers things  500px  Every time someone says “AI models forget context”, I grin. Because that’s only true if you haven’t yet played with vector databases. In my experience, building context-aware AI isn’t just about prompt engineering — it’s about memory management. In this article, I...&quot;</title>
		<link rel="alternate" type="text/html" href="https://johnwick.cc/index.php?title=Building_Smarter_AI_Systems_with_Vector_Databases&amp;diff=1659&amp;oldid=prev"/>
		<updated>2025-11-28T23:14:06Z</updated>

		<summary type="html">&lt;p&gt;Created page with &amp;quot;How I used embeddings, similarity search, and retrieval pipelines to build context-aware AI that actually remembers things  &lt;a href=&quot;/index.php?title=File:Building_Smarter_AI_Systems_with_Vector_Databases.jpg&quot; title=&quot;File:Building Smarter AI Systems with Vector Databases.jpg&quot;&gt;500px&lt;/a&gt;  Every time someone says “AI models forget context”, I grin. Because that’s only true if you haven’t yet played with vector databases. In my experience, building context-aware AI isn’t just about prompt engineering — it’s about memory management. In this article, I...&amp;quot;&lt;/p&gt;
&lt;p&gt;&lt;b&gt;New page&lt;/b&gt;&lt;/p&gt;&lt;div&gt;How I used embeddings, similarity search, and retrieval pipelines to build context-aware AI that actually remembers things&lt;br /&gt;
&lt;br /&gt;
[[file:Building_Smarter_AI_Systems_with_Vector_Databases.jpg|500px]]&lt;br /&gt;
&lt;br /&gt;
Every time someone says “AI models forget context”, I grin. Because that’s only true if you haven’t yet played with vector databases. In my experience, building context-aware AI isn’t just about prompt engineering — it’s about memory management.&lt;br /&gt;
In this article, I’ll walk you through how I built a production-grade retrieval-augmented generation (RAG) pipeline using vector embeddings, similarity search, and OpenAI’s API.&lt;br /&gt;
 If you’ve ever wanted your AI system to remember documents, conversations, or domain-specific knowledge — this one’s for you.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
1. What Are Vector Databases, Really?&lt;br /&gt;
&lt;br /&gt;
At the core of any retrieval-based AI is one simple idea: turn text into numbers that capture meaning, and then compare those numbers to find related content.&lt;br /&gt;
&lt;br /&gt;
Each document, paragraph, or even sentence can be transformed into a vector (a list of floating-point numbers) using an embedding model.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
from openai import OpenAI&lt;br /&gt;
import numpy as np&lt;br /&gt;
&lt;br /&gt;
client = OpenAI(api_key=&amp;quot;your_sk&amp;quot;)&lt;br /&gt;
&lt;br /&gt;
text = &amp;quot;Python is a high-level programming language.&amp;quot;&lt;br /&gt;
&lt;br /&gt;
# Get embedding vector&lt;br /&gt;
response = client.embeddings.create(&lt;br /&gt;
    model=&amp;quot;text-embedding-3-small&amp;quot;,&lt;br /&gt;
    input=text&lt;br /&gt;
)&lt;br /&gt;
&lt;br /&gt;
vector = np.array(response.data[0].embedding)&lt;br /&gt;
print(len(vector), &amp;quot;dimensions&amp;quot;)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Most modern embeddings (like OpenAI’s text-embedding-3-small) have 1,536 dimensions—enough to capture complex semantics.&lt;br /&gt;
When you store these vectors in a specialized database (like Pinecone, Weaviate, or FAISS), you can query them for semantic similarity. That means you’re not just matching keywords, but ideas.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
2. Setting Up a Vector Database with FAISS&lt;br /&gt;
&lt;br /&gt;
For experimentation, I love using FAISS, a library developed by Facebook AI Research. It’s fast, local, and perfect for prototypes.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
import faiss&lt;br /&gt;
import numpy as np&lt;br /&gt;
&lt;br /&gt;
# Let&amp;#039;s create some fake embeddings&lt;br /&gt;
data = np.random.random((100, 1536)).astype(&amp;#039;float32&amp;#039;)&lt;br /&gt;
&lt;br /&gt;
# Build FAISS index&lt;br /&gt;
index = faiss.IndexFlatL2(1536)&lt;br /&gt;
index.add(data)&lt;br /&gt;
&lt;br /&gt;
# Query vector&lt;br /&gt;
query = np.random.random((1, 1536)).astype(&amp;#039;float32&amp;#039;)&lt;br /&gt;
&lt;br /&gt;
# Find 5 closest vectors&lt;br /&gt;
distances, indices = index.search(query, 5)&lt;br /&gt;
&lt;br /&gt;
print(indices)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Each query compares the distance between vectors. Smaller distance = higher similarity. That’s the foundation of semantic search.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
3. Chunking Documents for Semantic Memory&lt;br /&gt;
&lt;br /&gt;
Before we can store anything, we need to chunk our documents. This is one of those underrated tasks that can make or break your retrieval accuracy.&lt;br /&gt;
Chunking is about breaking large texts into meaningful sections — big enough to hold context, small enough to stay precise.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
def chunk_text(text, chunk_size=500, overlap=50):&lt;br /&gt;
    chunks = []&lt;br /&gt;
    start = 0&lt;br /&gt;
    while start &amp;lt; len(text):&lt;br /&gt;
        end = start + chunk_size&lt;br /&gt;
        chunk = text[start:end]&lt;br /&gt;
        chunks.append(chunk)&lt;br /&gt;
        start += chunk_size - overlap&lt;br /&gt;
    return chunks&lt;br /&gt;
&lt;br /&gt;
document = &amp;quot;AI systems are designed to simulate human intelligence...&amp;quot; * 10&lt;br /&gt;
chunks = chunk_text(document)&lt;br /&gt;
print(len(chunks))&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Overlapping chunks ensure continuity of meaning — critical for maintaining context when the model retrieves relevant sections.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
4. Creating and Storing Embeddings for Each Chunk&lt;br /&gt;
&lt;br /&gt;
Once we have our chunks, we generate embeddings for each and store them in our vector database.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
from openai import OpenAI&lt;br /&gt;
import pandas as pd&lt;br /&gt;
&lt;br /&gt;
client = OpenAI(api_key=&amp;quot;your_sk&amp;quot;)&lt;br /&gt;
&lt;br /&gt;
def create_embeddings(chunks):&lt;br /&gt;
    data = []&lt;br /&gt;
    for chunk in chunks:&lt;br /&gt;
        embedding = client.embeddings.create(&lt;br /&gt;
            model=&amp;quot;text-embedding-3-small&amp;quot;,&lt;br /&gt;
            input=chunk&lt;br /&gt;
        ).data[0].embedding&lt;br /&gt;
        data.append(embedding)&lt;br /&gt;
    return np.array(data)&lt;br /&gt;
&lt;br /&gt;
embeddings = create_embeddings(chunks)&lt;br /&gt;
df = pd.DataFrame({&amp;quot;chunk&amp;quot;: chunks, &amp;quot;embedding&amp;quot;: list(embeddings)})&lt;br /&gt;
df.head()&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
We’ll use this DataFrame to connect chunks with their corresponding embeddings. In production, this data would go straight into Pinecone or Weaviate.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
5. Performing Semantic Search Over Stored Knowledge&lt;br /&gt;
&lt;br /&gt;
Now comes the fun part: querying our AI’s memory.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
query = &amp;quot;How can AI systems retain long-term knowledge?&amp;quot;&lt;br /&gt;
query_embedding = client.embeddings.create(&lt;br /&gt;
    model=&amp;quot;text-embedding-3-small&amp;quot;,&lt;br /&gt;
    input=query&lt;br /&gt;
).data[0].embedding&lt;br /&gt;
&lt;br /&gt;
# Compute cosine similarity&lt;br /&gt;
from numpy import dot&lt;br /&gt;
from numpy.linalg import norm&lt;br /&gt;
&lt;br /&gt;
def cosine_similarity(a, b):&lt;br /&gt;
    return dot(a, b) / (norm(a) * norm(b))&lt;br /&gt;
&lt;br /&gt;
df[&amp;quot;similarity&amp;quot;] = df[&amp;quot;embedding&amp;quot;].apply(lambda x: cosine_similarity(query_embedding, x))&lt;br /&gt;
top_chunks = df.sort_values(&amp;quot;similarity&amp;quot;, ascending=False).head(3)&lt;br /&gt;
print(top_chunks[&amp;quot;chunk&amp;quot;].values)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Now the AI retrieves the most semantically relevant text instead of just keyword matches — essentially “remembering” what’s important.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
6. Integrating Retrieval with GPT for Contextual Answers&lt;br /&gt;
&lt;br /&gt;
Here’s where it all comes together. We combine retrieved context with the user query, and pass it to GPT for a grounded, accurate answer.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
context = &amp;quot;\n\n&amp;quot;.join(top_chunks[&amp;quot;chunk&amp;quot;].values)&lt;br /&gt;
prompt = f&amp;quot;&amp;quot;&amp;quot;&lt;br /&gt;
You are an expert AI assistant.&lt;br /&gt;
Use the following context to answer the question accurately.&lt;br /&gt;
&lt;br /&gt;
Context:&lt;br /&gt;
{context}&lt;br /&gt;
&lt;br /&gt;
Question: {query}&lt;br /&gt;
&amp;quot;&amp;quot;&amp;quot;&lt;br /&gt;
&lt;br /&gt;
response = client.chat.completions.create(&lt;br /&gt;
    model=&amp;quot;gpt-4o-mini&amp;quot;,&lt;br /&gt;
    messages=[&lt;br /&gt;
        {&amp;quot;role&amp;quot;: &amp;quot;system&amp;quot;, &amp;quot;content&amp;quot;: &amp;quot;You are a knowledgeable assistant.&amp;quot;},&lt;br /&gt;
        {&amp;quot;role&amp;quot;: &amp;quot;user&amp;quot;, &amp;quot;content&amp;quot;: prompt}&lt;br /&gt;
    ]&lt;br /&gt;
)&lt;br /&gt;
&lt;br /&gt;
print(response.choices[0].message.content)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This is the backbone of RAG (Retrieval-Augmented Generation) — a pattern that underpins many of today’s intelligent chatbots, knowledge assistants, and internal AI tools.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
7. Building a Simple Gradio Interface&lt;br /&gt;
&lt;br /&gt;
Once you’ve got retrieval and generation nailed, the next step is a user interface. Gradio makes this ridiculously easy.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
import gradio as gr&lt;br /&gt;
&lt;br /&gt;
def answer_question(query):&lt;br /&gt;
    query_embedding = client.embeddings.create(&lt;br /&gt;
        model=&amp;quot;text-embedding-3-small&amp;quot;,&lt;br /&gt;
        input=query&lt;br /&gt;
    ).data[0].embedding&lt;br /&gt;
    df[&amp;quot;similarity&amp;quot;] = df[&amp;quot;embedding&amp;quot;].apply(lambda x: cosine_similarity(query_embedding, x))&lt;br /&gt;
    context = &amp;quot;\n\n&amp;quot;.join(df.sort_values(&amp;quot;similarity&amp;quot;, ascending=False).head(3)[&amp;quot;chunk&amp;quot;].values)&lt;br /&gt;
    prompt = f&amp;quot;Answer this using context:\n\n{context}\n\nQuestion: {query}&amp;quot;&lt;br /&gt;
    &lt;br /&gt;
    response = client.chat.completions.create(&lt;br /&gt;
        model=&amp;quot;gpt-4o-mini&amp;quot;,&lt;br /&gt;
        messages=[{&amp;quot;role&amp;quot;: &amp;quot;user&amp;quot;, &amp;quot;content&amp;quot;: prompt}]&lt;br /&gt;
    )&lt;br /&gt;
    return response.choices[0].message.content&lt;br /&gt;
&lt;br /&gt;
gr.Interface(fn=answer_question, inputs=&amp;quot;text&amp;quot;, outputs=&amp;quot;text&amp;quot;, title=&amp;quot;AI Memory Assistant&amp;quot;).launch()&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Now you have a fully interactive, memory-aware chatbot powered by your own knowledge base.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
8. Scaling It Up: Vector Databases in Production&lt;br /&gt;
When you outgrow FAISS, move to managed services like:&lt;br /&gt;
* 		Pinecone (super fast and serverless)&lt;br /&gt;
* 		Weaviate (supports hybrid search and metadata filters)&lt;br /&gt;
* 		Milvus (open-source powerhouse)&lt;br /&gt;
* 		Chroma (great for local prototypes)&lt;br /&gt;
With these, you can index millions of embeddings, support metadata-based search, and plug in your RAG system to handle real workloads.&lt;br /&gt;
Quote: “A good memory doesn’t just recall facts — it recalls relevance.” That’s exactly what vector databases give your AI.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
9. Lessons Learned from Building Memory-Driven AI&lt;br /&gt;
* 		Chunking strategy is everything. Overlap matters more than you think.&lt;br /&gt;
* 		Embedding quality determines recall accuracy. Garbage in, garbage out.&lt;br /&gt;
* 		Context window limits are real. Be strategic with what you send to the model.&lt;br /&gt;
* 		Store metadata. Source, author, date — it’ll save you later.&lt;br /&gt;
* 		Iterate fast. The beauty of Python is that you can prototype entire pipelines in hours.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Final Thoughts&lt;br /&gt;
After integrating vector search into my AI systems, my models started behaving like they actually understood history. They stopped hallucinating and began responding with grounded, contextual answers.&lt;br /&gt;
This isn’t just a trick — it’s the evolution of intelligent systems.&lt;br /&gt;
If you want to build AI that feels alive, give it memory. And as you just saw, all it takes is a few Python scripts, some embeddings, and a vector database.&lt;br /&gt;
&lt;br /&gt;
Read the full article here: https://medium.com/@abromohsin504/building-smarter-ai-systems-with-vector-databases-a2a9fe113c33&lt;/div&gt;</summary>
		<author><name>PC</name></author>
	</entry>
</feed>