The Setup: An IoT Platform at Uncomfortable Scale
Imagine a multi-tenant IoT SaaS: factories, power grids, telecom towers — each client has thousands of devices reporting data every second. A single enterprise user (say, a regional manager at Reliance) might have access to 1–2 lakh devices (100,000–200,000). Their dashboard: a paginated table with 20+ columns — temperature, voltage, pressure, status, firmware version — filterable on any column, any combination.
That combination — scale, dynamic ownership, multi-database reads, arbitrary column filters — is what makes this genuinely hard. Let's walk through every approach we tried.
Approach 1 — The Naive $in Clause
The first instinct is completely natural: load the user's device IDs from SQL, then pass them to MongoDB.
// Step 1: get device IDs from relational source
const deviceIds = await sql.query(
`SELECT device_id FROM user_devices WHERE user_id = ?`,
[userId]
);
// deviceIds = [D1, D2, D3, ... D100000] ← 100k elements
// Step 2: filter MongoDB with the $in list
const readings = await db.device_data.find({
device_id: { $in: deviceIds },
temperature: { $gt: 30 },
status: 'ACTIVE'
}).sort({ timestamp: -1 }).limit(50);
This works flawlessly with a test user who has 20 devices. It degrades at 1,000. It collapses at 100,000.
Approach 2 — Store user_ids Inside Each Document
If cross-DB round trips are expensive, why not embed ownership directly in the telemetry document?
{
"device_id": "D1",
"user_ids": ["user_1", "user_456", "user_789"],
"temperature": 32.4,
"voltage": 4.5,
"timestamp": "2025-01-15T10:00:00Z"
}
// Query becomes a single indexed lookup
db.device_data.find({
user_ids: "user_456",
temperature: { $gt: 30 }
});
Reads look great. The write side is a catastrophe.
Approach 3 — Redis Cache + Chunked Parallel Queries
Keep ownership out of documents, but split the giant $in into parallel chunks processed concurrently:
// Load all device IDs from Redis (fast, avoids SQL round trip)
const allDeviceIds = await redis.smembers(`user:${userId}:devices`);
// allDeviceIds.length = 100,000
const CHUNK_SIZE = 500;
const chunks = chunkArray(allDeviceIds, CHUNK_SIZE);
// chunks.length = 200
// Fire 200 parallel MongoDB queries
const results = await Promise.all(
chunks.map(chunk =>
db.device_data.find({
device_id: { $in: chunk },
temperature: { $gt: 30 }
}).toArray()
)
);
// Merge, sort, paginate in application memory
const merged = results
.flat()
.sort((a, b) => b.timestamp - a.timestamp)
.slice(page * pageSize, (page + 1) * pageSize);
Approach 4 — Mapping Table in Each Database
Mirror the user_devices relationship inside each store so it can join locally — avoiding the cross-DB hop entirely.
// MongoDB: $lookup against a mirrored mapping collection
db.device_data.aggregate([
{
$lookup: {
from: "user_device_mapping",
localField: "device_id",
foreignField: "device_id",
pipeline: [{ $match: { user_id: "user_456" } }],
as: "access"
}
},
{ $match: { access: { $ne: [] }, temperature: { $gt: 30 } } }
]);
-- ClickHouse: mapping table + join
SELECT d.*
FROM device_data d
INNER JOIN user_device_mapping m ON d.device_id = m.device_id
WHERE m.user_id = 'user_456'
AND d.temperature > 30;
This looks clean and solves the cross-DB problem. It breaks in three ways at production scale:
The Real Problem: Access Control Is a Tree
This is the insight that makes all the previous approaches wrong at the root level. In real IoT products, users are not assigned to individual devices. They are assigned to nodes in a hierarchy, and they inherit the entire subtree.
Approach 5 — Closure Table
Store every ancestor → descendant pair in SQL. Subtree membership becomes a single indexed lookup — no recursive CTE needed at read time.
-- Closure table structure
CREATE TABLE node_closure (
ancestor_id VARCHAR(36) NOT NULL,
descendant_id VARCHAR(36) NOT NULL,
depth INT NOT NULL,
is_device BOOLEAN NOT NULL DEFAULT false,
PRIMARY KEY (ancestor_id, descendant_id),
INDEX idx_ancestor (ancestor_id),
INDEX idx_descendant (descendant_id)
);
-- All device IDs under node N2 (user_456's assignment)
SELECT descendant_id AS device_id
FROM node_closure
WHERE ancestor_id = 'N2'
AND is_device = true;
Approach 6 — Materialized Path
Store a string path per node. Subtree queries become a prefix match.
| id | name | parent_id | path |
|---|---|---|---|
| N1 | World | — | /N1 |
| N2 | India | N1 | /N1/N2 |
| N8 | Maharashtra | N2 | /N1/N2/N8 |
| N4 | Pune | N8 | /N1/N2/N8/N4 |
| N3 | India | N1 | /N1/N3 |
| N6 | Pune | N3 | /N1/N3/N6 |
-- All devices under N2 (subtree prefix match)
SELECT id FROM nodes
WHERE path LIKE '/N1/N2/%'
AND is_device = true;
-- Works because LIKE 'prefix%' can use a B-tree index
Simpler writes than closure table. The prefix LIKE scan degrades when the matching subtree returns millions of device IDs — you still end up with a huge list to pass to MongoDB and ClickHouse. The cross-DB problem remains unsolved.
Approach 7 — Graph Database (Neo4j)
Tree traversal is exactly what graph databases are built for. Neo4j makes the query elegant:
// Find all devices accessible to user_456
MATCH (user:User { id: 'user_456' })
-[:ASSIGNED_TO]->(n:Node)
-[:ANCESTOR_OF*]->(d:Device)
RETURN d.id AS device_id;
This solves the traversal problem beautifully — cheap edge changes, no recursive SQL, correct handling of non-unique labels. But it doesn't solve the cross-database filtering problem.
The Pattern That Works: Ancestor Nodes at Ingestion Time
The breakthrough insight: stop resolving the tree at read time. Resolve it at write time. When a device emits a reading, the ingestion pipeline attaches the full ancestry — every node ID from root to the device's location.
{
"device_id": "D1",
"ancestor_nodes": [
{ "id": "N1", "name": "World" },
{ "id": "N2", "name": "India" },
{ "id": "N8", "name": "Maharashtra" },
{ "id": "N4", "name": "Pune" },
{ "id": "N11", "name": "Factory A" }
],
"temperature": 32.4,
"voltage": 4.5,
"status": "ACTIVE",
"timestamp": "2025-01-15T10:00:00Z"
}
Now the query is trivially small regardless of fleet size:
// user_456 is assigned to nodes N2 and N6
const userNodes = await getUserNodeIds(userId); // ['N2', 'N6']
// MongoDB: tiny $in of node IDs — works at any device count
const readings = await db.device_data.find({
"ancestor_nodes.id": { $in: userNodes },
temperature: { $gt: 30 }
}).sort({ timestamp: -1 }).limit(50);
Why store both id and name: Access control uses IDs (unique). The UI lets users search by name — but "Pune" and "India" repeat across branches. Storing the full path allows the frontend to show disambiguation:
// User types "Pune" in the location picker
// UI shows full paths so user can pick the right one:
// → World > India (N2) > Maharashtra > Pune (N4)
// → World > India (N3) > Gujarat > Pune (N6)
// User selects row 1 → query uses N4, not the string "Pune"
The ClickHouse Immutability Problem
ClickHouse is append-optimized. Mass ALTER ... UPDATE to rewrite ancestry on every tree change is the wrong fight — it's expensive, slow, and defeats the purpose of a columnar analytics engine.
-- ClickHouse dictionary: refreshed on schedule (e.g. every 5 min)
CREATE DICTIONARY user_nodes_dict (
user_id String,
node_ids Array(String)
)
SOURCE(CLICKHOUSE(
QUERY 'SELECT user_id, groupArray(node_id) FROM user_nodes GROUP BY user_id'
))
LIFETIME(300); -- refresh every 5 minutes
-- Analytics query: dictionary resolves ancestry at query time
SELECT
device_id,
avg(temperature) AS avg_temp,
max(voltage) AS peak_voltage,
count() AS reading_count
FROM device_history
WHERE hasAny(
ancestor_node_ids,
dictGet('user_nodes_dict', 'node_ids', 'user_456')
)
AND timestamp > now() - INTERVAL 7 DAY
GROUP BY device_id
ORDER BY avg_temp DESC;
Full Architecture: End-to-End
Tradeoffs: The Honest Summary
| Approach | Read cost | Write cost | Tree changes | Use when |
|---|---|---|---|---|
Naive $in | O(n) at scale | Low | Easy | < 1,000 devices |
user_ids embedded | Fast | Catastrophic | Impossible | Write-once, stable ownership |
| Redis chunking | Medium | Medium | Cache sync needed | Medium scale, no tree ACL |
| Mapping table per DB | Join cost | 4-way sync | Sync all stores | Flat ownership, small scale |
| Closure table | Fast (indexed) | Medium | Expensive (bulk rewrite) | Slow-changing trees |
| Materialized path | Prefix scan | Low | Path rewrite | Medium scale, simple trees |
| Neo4j alone | Fast traversal | Medium | Cheap | Graph truth — not cross-DB |
| Ancestors at ingest | O(log n) always | Slight overhead | Async repair | Production multi-DB + trees |
When This Pattern Doesn't Work
Be honest about the tradeoffs. The ancestor-at-ingestion pattern has real limitations:
- Hard real-time revocation: When a node move happens, there's a staleness window (seconds to minutes) before all documents are repaired. If your security model requires instant revocation (financial, healthcare), this needs a stricter, more expensive design — typically a token-based approach where the access check happens at query time, not write time.
- Storage overhead: Every document now carries the full ancestry array. For a 5-level deep tree, that's 5 extra node objects per document. Acceptable for most IoT; worth measuring for extremely high-frequency devices.
- Repair complexity: The async ancestry repair pipeline is non-trivial. You need idempotent consumers, dead-letter queues, and monitoring for repair lag.
What I'd Do Differently
If I were building this again from scratch, I'd set up the ancestor-at-ingestion pattern from day one, and I'd invest early in the repair pipeline monitoring. The hardest part isn't the ingestion enrichment — that's straightforward. The hardest part is building confidence that the async repair is working correctly under load, and having the observability to know when it's lagging.
The failed approaches weren't wrong to try. Approach 1 is the right prototype. Approach 4 is the right design for a product with flat ownership and 50k devices. You only discover that you need the full ancestry pattern when the tree ACL requirements and the fleet scale arrive together — usually later than you'd like.
The core lesson
Push complexity to write time. Pay once per document at ingestion so every read is cheap. Design your access model around node IDs, not device IDs — your fleet will grow, but your org hierarchy won't flatten. And when in doubt, measure: the right approach for your constraints might not be the same as ours.