The Brutal Truth About Multi-Tenant IoT Data Filtering: Every Obvious Solution Failed

The Setup: An IoT Platform at Uncomfortable Scale

Imagine a multi-tenant IoT SaaS: factories, power grids, telecom towers — each client has thousands of devices reporting data every second. A single enterprise user (say, a regional manager at Reliance) might have access to 1–2 lakh devices (100,000–200,000). Their dashboard: a paginated table with 20+ columns — temperature, voltage, pressure, status, firmware version — filterable on any column, any combination.

That combination — scale, dynamic ownership, multi-database reads, arbitrary column filters — is what makes this genuinely hard. Let's walk through every approach we tried.

Approach 1 — The Naive `$in` Clause

The first instinct is completely natural: load the user's device IDs from SQL, then pass them to MongoDB.

// Step 1: get device IDs from relational source
const deviceIds = await sql.query(
  `SELECT device_id FROM user_devices WHERE user_id = ?`,
  [userId]
);
// deviceIds = [D1, D2, D3, ... D100000]  ← 100k elements

// Step 2: filter MongoDB with the $in list
const readings = await db.device_data.find({
  device_id: { $in: deviceIds },
  temperature: { $gt: 30 },
  status: 'ACTIVE'
}).sort({ timestamp: -1 }).limit(50);

This works flawlessly with a test user who has 20 devices. It degrades at 1,000. It collapses at 100,000.

Approach 2 — Store `user_ids` Inside Each Document

If cross-DB round trips are expensive, why not embed ownership directly in the telemetry document?

{
  "device_id": "D1",
  "user_ids": ["user_1", "user_456", "user_789"],
  "temperature": 32.4,
  "voltage": 4.5,
  "timestamp": "2025-01-15T10:00:00Z"
}

// Query becomes a single indexed lookup
db.device_data.find({
  user_ids: "user_456",
  temperature: { $gt: 30 }
});

Reads look great. The write side is a catastrophe.

Approach 3 — Redis Cache + Chunked Parallel Queries

Keep ownership out of documents, but split the giant $in into parallel chunks processed concurrently:

// Load all device IDs from Redis (fast, avoids SQL round trip)
const allDeviceIds = await redis.smembers(`user:${userId}:devices`);
// allDeviceIds.length = 100,000

const CHUNK_SIZE = 500;
const chunks = chunkArray(allDeviceIds, CHUNK_SIZE);
// chunks.length = 200

// Fire 200 parallel MongoDB queries
const results = await Promise.all(
  chunks.map(chunk =>
    db.device_data.find({
      device_id: { $in: chunk },
      temperature: { $gt: 30 }
    }).toArray()
  )
);

// Merge, sort, paginate in application memory
const merged = results
  .flat()
  .sort((a, b) => b.timestamp - a.timestamp)
  .slice(page * pageSize, (page + 1) * pageSize);

Approach 4 — Mapping Table in Each Database

Mirror the user_devices relationship inside each store so it can join locally — avoiding the cross-DB hop entirely.

// MongoDB: $lookup against a mirrored mapping collection
db.device_data.aggregate([
  {
    $lookup: {
      from: "user_device_mapping",
      localField: "device_id",
      foreignField: "device_id",
      pipeline: [{ $match: { user_id: "user_456" } }],
      as: "access"
    }
  },
  { $match: { access: { $ne: [] }, temperature: { $gt: 30 } } }
]);

-- ClickHouse: mapping table + join
SELECT d.*
FROM device_data d
INNER JOIN user_device_mapping m ON d.device_id = m.device_id
WHERE m.user_id = 'user_456'
  AND d.temperature > 30;

This looks clean and solves the cross-DB problem. It breaks in three ways at production scale:

The Real Problem: Access Control Is a Tree

This is the insight that makes all the previous approaches wrong at the root level. In real IoT products, users are not assigned to individual devices. They are assigned to nodes in a hierarchy, and they inherit the entire subtree.

Approach 5 — Closure Table

Store every ancestor → descendant pair in SQL. Subtree membership becomes a single indexed lookup — no recursive CTE needed at read time.

-- Closure table structure
CREATE TABLE node_closure (
  ancestor_id   VARCHAR(36) NOT NULL,
  descendant_id VARCHAR(36) NOT NULL,
  depth         INT NOT NULL,
  is_device     BOOLEAN NOT NULL DEFAULT false,
  PRIMARY KEY (ancestor_id, descendant_id),
  INDEX idx_ancestor (ancestor_id),
  INDEX idx_descendant (descendant_id)
);

-- All device IDs under node N2 (user_456's assignment)
SELECT descendant_id AS device_id
FROM node_closure
WHERE ancestor_id = 'N2'
  AND is_device = true;

Approach 6 — Materialized Path

Store a string path per node. Subtree queries become a prefix match.

id	name	parent_id	path
N1	World	—	/N1
N2	India	N1	/N1/N2
N8	Maharashtra	N2	/N1/N2/N8
N4	Pune	N8	/N1/N2/N8/N4
N3	India	N1	/N1/N3
N6	Pune	N3	/N1/N3/N6

-- All devices under N2 (subtree prefix match)
SELECT id FROM nodes
WHERE path LIKE '/N1/N2/%'
  AND is_device = true;
-- Works because LIKE 'prefix%' can use a B-tree index

Simpler writes than closure table. The prefix LIKE scan degrades when the matching subtree returns millions of device IDs — you still end up with a huge list to pass to MongoDB and ClickHouse. The cross-DB problem remains unsolved.

Approach 7 — Graph Database (Neo4j)

Tree traversal is exactly what graph databases are built for. Neo4j makes the query elegant:

// Find all devices accessible to user_456
MATCH (user:User { id: 'user_456' })
      -[:ASSIGNED_TO]->(n:Node)
      -[:ANCESTOR_OF*]->(d:Device)
RETURN d.id AS device_id;

This solves the traversal problem beautifully — cheap edge changes, no recursive SQL, correct handling of non-unique labels. But it doesn't solve the cross-database filtering problem.

The Pattern That Works: Ancestor Nodes at Ingestion Time

The breakthrough insight: stop resolving the tree at read time. Resolve it at write time. When a device emits a reading, the ingestion pipeline attaches the full ancestry — every node ID from root to the device's location.

{
  "device_id": "D1",
  "ancestor_nodes": [
    { "id": "N1", "name": "World" },
    { "id": "N2", "name": "India" },
    { "id": "N8", "name": "Maharashtra" },
    { "id": "N4", "name": "Pune" },
    { "id": "N11", "name": "Factory A" }
  ],
  "temperature": 32.4,
  "voltage": 4.5,
  "status": "ACTIVE",
  "timestamp": "2025-01-15T10:00:00Z"
}

Now the query is trivially small regardless of fleet size:

// user_456 is assigned to nodes N2 and N6
const userNodes = await getUserNodeIds(userId); // ['N2', 'N6']

// MongoDB: tiny $in of node IDs — works at any device count
const readings = await db.device_data.find({
  "ancestor_nodes.id": { $in: userNodes },
  temperature: { $gt: 30 }
}).sort({ timestamp: -1 }).limit(50);

Why store both id and name: Access control uses IDs (unique). The UI lets users search by name — but "Pune" and "India" repeat across branches. Storing the full path allows the frontend to show disambiguation:

// User types "Pune" in the location picker
// UI shows full paths so user can pick the right one:
// → World > India (N2) > Maharashtra > Pune (N4)
// → World > India (N3) > Gujarat > Pune (N6)
// User selects row 1 → query uses N4, not the string "Pune"

The ClickHouse Immutability Problem

ClickHouse is append-optimized. Mass ALTER ... UPDATE to rewrite ancestry on every tree change is the wrong fight — it's expensive, slow, and defeats the purpose of a columnar analytics engine.

-- ClickHouse dictionary: refreshed on schedule (e.g. every 5 min)
CREATE DICTIONARY user_nodes_dict (
  user_id  String,
  node_ids Array(String)
)
SOURCE(CLICKHOUSE(
  QUERY 'SELECT user_id, groupArray(node_id) FROM user_nodes GROUP BY user_id'
))
LIFETIME(300);  -- refresh every 5 minutes

-- Analytics query: dictionary resolves ancestry at query time
SELECT
  device_id,
  avg(temperature) AS avg_temp,
  max(voltage)     AS peak_voltage,
  count()          AS reading_count
FROM device_history
WHERE hasAny(
  ancestor_node_ids,
  dictGet('user_nodes_dict', 'node_ids', 'user_456')
)
AND timestamp > now() - INTERVAL 7 DAY
GROUP BY device_id
ORDER BY avg_temp DESC;

Full Architecture: End-to-End

Tradeoffs: The Honest Summary

Approach	Read cost	Write cost	Tree changes	Use when
Naive `$in`	O(n) at scale	Low	Easy	< 1,000 devices
`user_ids` embedded	Fast	Catastrophic	Impossible	Write-once, stable ownership
Redis chunking	Medium	Medium	Cache sync needed	Medium scale, no tree ACL
Mapping table per DB	Join cost	4-way sync	Sync all stores	Flat ownership, small scale
Closure table	Fast (indexed)	Medium	Expensive (bulk rewrite)	Slow-changing trees
Materialized path	Prefix scan	Low	Path rewrite	Medium scale, simple trees
Neo4j alone	Fast traversal	Medium	Cheap	Graph truth — not cross-DB
Ancestors at ingest	O(log n) always	Slight overhead	Async repair	Production multi-DB + trees

When This Pattern Doesn't Work

Be honest about the tradeoffs. The ancestor-at-ingestion pattern has real limitations:

Hard real-time revocation: When a node move happens, there's a staleness window (seconds to minutes) before all documents are repaired. If your security model requires instant revocation (financial, healthcare), this needs a stricter, more expensive design — typically a token-based approach where the access check happens at query time, not write time.
Storage overhead: Every document now carries the full ancestry array. For a 5-level deep tree, that's 5 extra node objects per document. Acceptable for most IoT; worth measuring for extremely high-frequency devices.
Repair complexity: The async ancestry repair pipeline is non-trivial. You need idempotent consumers, dead-letter queues, and monitoring for repair lag.

What I'd Do Differently

If I were building this again from scratch, I'd set up the ancestor-at-ingestion pattern from day one, and I'd invest early in the repair pipeline monitoring. The hardest part isn't the ingestion enrichment — that's straightforward. The hardest part is building confidence that the async repair is working correctly under load, and having the observability to know when it's lagging.

The failed approaches weren't wrong to try. Approach 1 is the right prototype. Approach 4 is the right design for a product with flat ownership and 50k devices. You only discover that you need the full ancestry pattern when the tree ACL requirements and the fleet scale arrive together — usually later than you'd like.

The core lesson

Push complexity to write time. Pay once per document at ingestion so every read is cheap. Design your access model around node IDs, not device IDs — your fleet will grow, but your org hierarchy won't flatten. And when in doubt, measure: the right approach for your constraints might not be the same as ours.