Vector Search Explained: From KNN to HNSW — A Complete Guide

Why Keyword Search Breaks — A Real Example

A user types "server crash" into your support ticket search. Your system has a ticket titled "application unexpectedly terminated". Keyword search returns zero results. Vector search finds it immediately.

This isn't magic — it's geometry. Every piece of text can be represented as a point in high-dimensional space, where semantically similar text lands close together. Vector search finds the closest points to your query.

How Vectors Are Created

An embedding model (like OpenAI's text-embedding-3-small or a local sentence-transformer) converts text into a dense numeric array — a vector of 384 to 1536 floating-point numbers.

import OpenAI from 'openai';
const openai = new OpenAI();

// Text → vector (1536 dimensions for text-embedding-3-small)
const response = await openai.embeddings.create({
  model: 'text-embedding-3-small',
  input: 'server crash during peak load'
});

const queryVector = response.data[0].embedding;
// queryVector = [0.023, -0.147, 0.891, 0.034, ... ] (1536 numbers)
// This single array encodes the semantic meaning of the text

Every document in your database gets embedded the same way at index time. Search means: find the vectors closest to the query vector.

Approach 1 — Brute Force KNN (K-Nearest Neighbors)

The conceptually simplest approach: compare the query vector to every stored vector, return the k closest.

import numpy as np

def cosine_similarity(a, b):
    return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))

def brute_force_knn(query_vec, all_vectors, k=10):
    """Compare query to every single stored vector."""
    similarities = []
    for i, vec in enumerate(all_vectors):
        sim = cosine_similarity(query_vec, vec)
        similarities.append((i, sim))

    # Sort all results, return top k
    similarities.sort(key=lambda x: x[1], reverse=True)
    return similarities[:k]

Approximate Nearest Neighbor (ANN) — The Tradeoff

The key insight: in production, you don't need the exact top-10 results. You need results that are good enough — 95-99% recall — returned in milliseconds. This is the ANN tradeoff: sacrifice a little accuracy for a massive speed gain.

Algorithm	Recall	Query time (1M vecs)	Build time	Notes
Brute-force KNN	100%	~1,500ms	0 (no index)	Exact but O(n)
HNSW	95–99%	~2–5ms	Minutes	Best recall/speed ratio
IVF-Flat	90–98%	~5–20ms	Seconds	Cluster-based, less memory
LSH	85–95%	~1–3ms	Fast	Hash-based, lower quality

HNSW — How It Actually Works

HNSW stands for Hierarchical Navigable Small World. It builds a multi-layer graph where:

The top layers are sparse — only a few nodes, widely spread (coarse zoom)
The bottom layer contains all nodes, densely connected (fine zoom)
Each node connects to its nearest neighbors within the same layer

Search navigates from coarse to fine — like using a map at increasing zoom levels:

Building the HNSW Index

When you insert a new vector:

Using HNSW in Practice — pgvector

-- PostgreSQL with pgvector extension
CREATE EXTENSION IF NOT EXISTS vector;

-- Table with a 1536-dim embedding column
CREATE TABLE support_tickets (
  id          BIGSERIAL PRIMARY KEY,
  title       TEXT NOT NULL,
  body        TEXT NOT NULL,
  embedding   vector(1536),
  created_at  TIMESTAMPTZ DEFAULT now()
);

-- Build HNSW index (runs once, then maintained automatically)
CREATE INDEX ON support_tickets
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);
-- m=16: each node connects to 16 neighbors (good default)
-- ef_construction=64: search width during build

-- Semantic search: find 10 most similar tickets to a query
SELECT id, title, 1 - (embedding <=> $1) AS similarity
FROM support_tickets
ORDER BY embedding <=> $1  -- <=> is cosine distance operator
LIMIT 10;

-- With metadata filter (hybrid search)
SELECT id, title, 1 - (embedding <=> $1) AS similarity
FROM support_tickets
WHERE status = 'open'
  AND created_at > now() - INTERVAL '30 days'
ORDER BY embedding <=> $1
LIMIT 10;

// Node.js: full pipeline
import { Pool } from 'pg';
import OpenAI from 'openai';

const db = new Pool({ connectionString: process.env.DATABASE_URL });
const openai = new OpenAI();

async function semanticSearch(query, limit = 10) {
  // 1. Embed the query
  const { data } = await openai.embeddings.create({
    model: 'text-embedding-3-small',
    input: query
  });
  const queryEmbedding = data[0].embedding;

  // 2. pgvector HNSW search
  const { rows } = await db.query(`
    SELECT id, title, 1 - (embedding <=> $1) AS similarity
    FROM support_tickets
    ORDER BY embedding <=> $1
    LIMIT $2
  `, [`[${queryEmbedding}]`, limit]);

  return rows;
}

// "server crash" finds "application unexpectedly terminated"
const results = await semanticSearch('server crash during peak load');
// Returns semantically similar tickets even with different words

HNSW Parameters: What to Tune

Parameter	What it controls	Higher value means	Typical range
`m`	Max edges per node	Better recall, more RAM, slower build	8–64
`ef_construction`	Build search width	Better graph quality, slower build	64–512
`ef_search`	Query search width	Better recall, slower queries	40–400

When to Use What

Scenario	Recommendation
< 100k vectors, any query time ok	Brute-force (exact, simple, no index needed)
100k–10M vectors, < 50ms queries	HNSW in pgvector, Qdrant, or Weaviate
10M+ vectors, memory-constrained	IVF-PQ (quantized) — lower recall, much less RAM
Need keyword + semantic hybrid	Elasticsearch with dense_vector + BM25 RRF fusion
Already on PostgreSQL, < 5M vectors	pgvector with HNSW — no extra service needed

The key takeaway

Vector search is fundamentally a geometry problem: embed everything into the same space, then find close neighbors. Brute-force KNN gives you exact answers but doesn't scale. HNSW gives you near-exact answers in O(log n) by building a hierarchical graph that lets you navigate from coarse to fine. In production, 95–99% recall at 2ms is always better than 100% recall at 2 seconds.