Why Keyword Search Breaks — A Real Example
A user types "server crash" into your support ticket search. Your system has a ticket titled "application unexpectedly terminated". Keyword search returns zero results. Vector search finds it immediately.
This isn't magic — it's geometry. Every piece of text can be represented as a point in high-dimensional space, where semantically similar text lands close together. Vector search finds the closest points to your query.
How Vectors Are Created
An embedding model (like OpenAI's text-embedding-3-small or a local sentence-transformer) converts text into a dense numeric array — a vector of 384 to 1536 floating-point numbers.
import OpenAI from 'openai';
const openai = new OpenAI();
// Text → vector (1536 dimensions for text-embedding-3-small)
const response = await openai.embeddings.create({
model: 'text-embedding-3-small',
input: 'server crash during peak load'
});
const queryVector = response.data[0].embedding;
// queryVector = [0.023, -0.147, 0.891, 0.034, ... ] (1536 numbers)
// This single array encodes the semantic meaning of the text
Every document in your database gets embedded the same way at index time. Search means: find the vectors closest to the query vector.
Approach 1 — Brute Force KNN (K-Nearest Neighbors)
The conceptually simplest approach: compare the query vector to every stored vector, return the k closest.
import numpy as np
def cosine_similarity(a, b):
return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))
def brute_force_knn(query_vec, all_vectors, k=10):
"""Compare query to every single stored vector."""
similarities = []
for i, vec in enumerate(all_vectors):
sim = cosine_similarity(query_vec, vec)
similarities.append((i, sim))
# Sort all results, return top k
similarities.sort(key=lambda x: x[1], reverse=True)
return similarities[:k]
Approximate Nearest Neighbor (ANN) — The Tradeoff
The key insight: in production, you don't need the exact top-10 results. You need results that are good enough — 95-99% recall — returned in milliseconds. This is the ANN tradeoff: sacrifice a little accuracy for a massive speed gain.
| Algorithm | Recall | Query time (1M vecs) | Build time | Notes |
|---|---|---|---|---|
| Brute-force KNN | 100% | ~1,500ms | 0 (no index) | Exact but O(n) |
| HNSW | 95–99% | ~2–5ms | Minutes | Best recall/speed ratio |
| IVF-Flat | 90–98% | ~5–20ms | Seconds | Cluster-based, less memory |
| LSH | 85–95% | ~1–3ms | Fast | Hash-based, lower quality |
HNSW — How It Actually Works
HNSW stands for Hierarchical Navigable Small World. It builds a multi-layer graph where:
- The top layers are sparse — only a few nodes, widely spread (coarse zoom)
- The bottom layer contains all nodes, densely connected (fine zoom)
- Each node connects to its nearest neighbors within the same layer
Search navigates from coarse to fine — like using a map at increasing zoom levels:
Building the HNSW Index
When you insert a new vector:
Using HNSW in Practice — pgvector
-- PostgreSQL with pgvector extension
CREATE EXTENSION IF NOT EXISTS vector;
-- Table with a 1536-dim embedding column
CREATE TABLE support_tickets (
id BIGSERIAL PRIMARY KEY,
title TEXT NOT NULL,
body TEXT NOT NULL,
embedding vector(1536),
created_at TIMESTAMPTZ DEFAULT now()
);
-- Build HNSW index (runs once, then maintained automatically)
CREATE INDEX ON support_tickets
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);
-- m=16: each node connects to 16 neighbors (good default)
-- ef_construction=64: search width during build
-- Semantic search: find 10 most similar tickets to a query
SELECT id, title, 1 - (embedding <=> $1) AS similarity
FROM support_tickets
ORDER BY embedding <=> $1 -- <=> is cosine distance operator
LIMIT 10;
-- With metadata filter (hybrid search)
SELECT id, title, 1 - (embedding <=> $1) AS similarity
FROM support_tickets
WHERE status = 'open'
AND created_at > now() - INTERVAL '30 days'
ORDER BY embedding <=> $1
LIMIT 10;
// Node.js: full pipeline
import { Pool } from 'pg';
import OpenAI from 'openai';
const db = new Pool({ connectionString: process.env.DATABASE_URL });
const openai = new OpenAI();
async function semanticSearch(query, limit = 10) {
// 1. Embed the query
const { data } = await openai.embeddings.create({
model: 'text-embedding-3-small',
input: query
});
const queryEmbedding = data[0].embedding;
// 2. pgvector HNSW search
const { rows } = await db.query(`
SELECT id, title, 1 - (embedding <=> $1) AS similarity
FROM support_tickets
ORDER BY embedding <=> $1
LIMIT $2
`, [`[${queryEmbedding}]`, limit]);
return rows;
}
// "server crash" finds "application unexpectedly terminated"
const results = await semanticSearch('server crash during peak load');
// Returns semantically similar tickets even with different words
HNSW Parameters: What to Tune
| Parameter | What it controls | Higher value means | Typical range |
|---|---|---|---|
m | Max edges per node | Better recall, more RAM, slower build | 8–64 |
ef_construction | Build search width | Better graph quality, slower build | 64–512 |
ef_search | Query search width | Better recall, slower queries | 40–400 |
When to Use What
| Scenario | Recommendation |
|---|---|
| < 100k vectors, any query time ok | Brute-force (exact, simple, no index needed) |
| 100k–10M vectors, < 50ms queries | HNSW in pgvector, Qdrant, or Weaviate |
| 10M+ vectors, memory-constrained | IVF-PQ (quantized) — lower recall, much less RAM |
| Need keyword + semantic hybrid | Elasticsearch with dense_vector + BM25 RRF fusion |
| Already on PostgreSQL, < 5M vectors | pgvector with HNSW — no extra service needed |
The key takeaway
Vector search is fundamentally a geometry problem: embed everything into the same space, then find close neighbors. Brute-force KNN gives you exact answers but doesn't scale. HNSW gives you near-exact answers in O(log n) by building a hierarchical graph that lets you navigate from coarse to fine. In production, 95–99% recall at 2ms is always better than 100% recall at 2 seconds.