Vector Search: From KNN to HNSW Q Query point Nearest neighbors Data points KNN: O(n) brute force HNSW: O(log n) graph Powers: pgvector, Pinecone, Qdrant

Why Keyword Search Breaks — A Real Example

A user types "server crash" into your support ticket search. Your system has a ticket titled "application unexpectedly terminated". Keyword search returns zero results. Vector search finds it immediately.

This isn't magic — it's geometry. Every piece of text can be represented as a point in high-dimensional space, where semantically similar text lands close together. Vector search finds the closest points to your query.

Semantic Space (simplified to 2D) dim 1 dim 2 "server crash" "application crashed" small distance = similar "kitten" "cat" "quarterly revenue" Distance in this space = semantic similarity. Close points = similar meaning, regardless of exact words.

How Vectors Are Created

An embedding model (like OpenAI's text-embedding-3-small or a local sentence-transformer) converts text into a dense numeric array — a vector of 384 to 1536 floating-point numbers.

import OpenAI from 'openai';
const openai = new OpenAI();

// Text → vector (1536 dimensions for text-embedding-3-small)
const response = await openai.embeddings.create({
  model: 'text-embedding-3-small',
  input: 'server crash during peak load'
});

const queryVector = response.data[0].embedding;
// queryVector = [0.023, -0.147, 0.891, 0.034, ... ] (1536 numbers)
// This single array encodes the semantic meaning of the text

Every document in your database gets embedded the same way at index time. Search means: find the vectors closest to the query vector.

Approach 1 — Brute Force KNN (K-Nearest Neighbors)

The conceptually simplest approach: compare the query vector to every stored vector, return the k closest.

import numpy as np

def cosine_similarity(a, b):
    return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))

def brute_force_knn(query_vec, all_vectors, k=10):
    """Compare query to every single stored vector."""
    similarities = []
    for i, vec in enumerate(all_vectors):
        sim = cosine_similarity(query_vec, vec)
        similarities.append((i, sim))

    # Sort all results, return top k
    similarities.sort(key=lambda x: x[1], reverse=True)
    return similarities[:k]
Brute Force KNN: What Happens at Scale Query time = N x D x C N = vectors in DB, D = dimensions (e.g. 1536), C = cost per distance computation 10K vectors ~15ms acceptable 100K vectors ~150ms slow 1M vectors ~1.5s unusable 10M vectors ~15s production down At 10M vectors, 10 concurrent users = 150s of pure CPU work/sec. KNN is exact (100% recall). The problem is it's O(N) -- every new document makes every search slower.

Approximate Nearest Neighbor (ANN) — The Tradeoff

The key insight: in production, you don't need the exact top-10 results. You need results that are good enough — 95-99% recall — returned in milliseconds. This is the ANN tradeoff: sacrifice a little accuracy for a massive speed gain.

AlgorithmRecallQuery time (1M vecs)Build timeNotes
Brute-force KNN100%~1,500ms0 (no index)Exact but O(n)
HNSW95–99%~2–5msMinutesBest recall/speed ratio
IVF-Flat90–98%~5–20msSecondsCluster-based, less memory
LSH85–95%~1–3msFastHash-based, lower quality

HNSW — How It Actually Works

HNSW stands for Hierarchical Navigable Small World. It builds a multi-layer graph where:

  • The top layers are sparse — only a few nodes, widely spread (coarse zoom)
  • The bottom layer contains all nodes, densely connected (fine zoom)
  • Each node connects to its nearest neighbors within the same layer
HNSW Structure (simplified) Layer 2 (coarsest -- entry point, ~5% nodes) A Z Layer 1 (medium, ~20% nodes) A F M Z Layer 0 (finest -- ALL nodes) A B C D ... Z Each node connects to its k nearest neighbors. Upper-layer nodes also exist in all lower layers. HNSW Multi-Layer Structure Layer 2 — Sparse (entry point) A Z Layer 1 — Medium (~20% of nodes) A F M Z Layer 0 — Dense (ALL nodes) A B D F K L M Z coarse jump refine FOUND ━━▸ Search path: coarse → fine in O(log n) steps

Search navigates from coarse to fine — like using a map at increasing zoom levels:

HNSW Search: Finding Nearest Neighbor to Query Q Step 1: Layer 2 A Z Z is closer to Q than A Step 2: Layer 1 Z M M is closer to Q than Z Step 3: Layer 0 M L K Greedy walk, always move closer Step 4: DONE K K's neighbors: none closer. Result found. Total comparisons: ~log(N) x ef At 1M vectors: ~50 comparisons instead of 1,000,000

Building the HNSW Index

When you insert a new vector:

Insert Algorithm New vector V arrives Randomly assign layer (probabilistic) Layer 0: always (100%) Layer 1: 1/e = 37% Layer 2: 1/e2 = 14% Layer 3: 1/e3 = 5% (creates hierarchical sparsity) For each assigned layer (top to bottom): Find M nearest neighbors to V, create bidirectional edges Prune edges if node exceeds M_max connections M = 16-64 (max connections/node) ef_construction = 100-200 (build search width) Higher M = better recall, more memory, slower build

Using HNSW in Practice — pgvector

-- PostgreSQL with pgvector extension
CREATE EXTENSION IF NOT EXISTS vector;

-- Table with a 1536-dim embedding column
CREATE TABLE support_tickets (
  id          BIGSERIAL PRIMARY KEY,
  title       TEXT NOT NULL,
  body        TEXT NOT NULL,
  embedding   vector(1536),
  created_at  TIMESTAMPTZ DEFAULT now()
);

-- Build HNSW index (runs once, then maintained automatically)
CREATE INDEX ON support_tickets
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);
-- m=16: each node connects to 16 neighbors (good default)
-- ef_construction=64: search width during build
-- Semantic search: find 10 most similar tickets to a query
SELECT id, title, 1 - (embedding <=> $1) AS similarity
FROM support_tickets
ORDER BY embedding <=> $1  -- <=> is cosine distance operator
LIMIT 10;

-- With metadata filter (hybrid search)
SELECT id, title, 1 - (embedding <=> $1) AS similarity
FROM support_tickets
WHERE status = 'open'
  AND created_at > now() - INTERVAL '30 days'
ORDER BY embedding <=> $1
LIMIT 10;
// Node.js: full pipeline
import { Pool } from 'pg';
import OpenAI from 'openai';

const db = new Pool({ connectionString: process.env.DATABASE_URL });
const openai = new OpenAI();

async function semanticSearch(query, limit = 10) {
  // 1. Embed the query
  const { data } = await openai.embeddings.create({
    model: 'text-embedding-3-small',
    input: query
  });
  const queryEmbedding = data[0].embedding;

  // 2. pgvector HNSW search
  const { rows } = await db.query(`
    SELECT id, title, 1 - (embedding <=> $1) AS similarity
    FROM support_tickets
    ORDER BY embedding <=> $1
    LIMIT $2
  `, [`[${queryEmbedding}]`, limit]);

  return rows;
}

// "server crash" finds "application unexpectedly terminated"
const results = await semanticSearch('server crash during peak load');
// Returns semantically similar tickets even with different words

HNSW Parameters: What to Tune

ParameterWhat it controlsHigher value meansTypical range
mMax edges per nodeBetter recall, more RAM, slower build8–64
ef_constructionBuild search widthBetter graph quality, slower build64–512
ef_searchQuery search widthBetter recall, slower queries40–400
Tuning Guide Production search (high recall) m=16 ef_construction=128 ef_search=100 ~99% recall, ~3ms High-throughput (speed over recall) m=8 ef_construction=64 ef_search=40 ~95% recall, ~1ms Analytics / recommendation m=32 ef_construction=256 ef_search=200 ~99.5% recall, ~8ms Memory Estimate RAM = N x D x 4 bytes x 1.3 (overhead) 1M vectors x 1536 dims x 4B x 1.3 = ~8GB

When to Use What

ScenarioRecommendation
< 100k vectors, any query time okBrute-force (exact, simple, no index needed)
100k–10M vectors, < 50ms queriesHNSW in pgvector, Qdrant, or Weaviate
10M+ vectors, memory-constrainedIVF-PQ (quantized) — lower recall, much less RAM
Need keyword + semantic hybridElasticsearch with dense_vector + BM25 RRF fusion
Already on PostgreSQL, < 5M vectorspgvector with HNSW — no extra service needed

The key takeaway

Vector search is fundamentally a geometry problem: embed everything into the same space, then find close neighbors. Brute-force KNN gives you exact answers but doesn't scale. HNSW gives you near-exact answers in O(log n) by building a hierarchical graph that lets you navigate from coarse to fine. In production, 95–99% recall at 2ms is always better than 100% recall at 2 seconds.