There is a class of engineering problems that look trivial until real constraints show up. Multi-tenant IoT device filtering is one of them.

This post documents the full journey: approaches we considered, why each failed, and the architecture that holds up at scale. The dead ends matter as much as the answers.

The setup

Picture a typical IoT SaaS stack. Data is split across databases, each chosen for what it does best:

DatabaseWhat it stores
SQL (PostgreSQL / MySQL)Users, devices, user_devices — relational ownership
MongoDBReal-time readings — many parameters per second per device
ClickHouseAggregated analytics and historical time-series
ElasticsearchFull-text search over device metadata

A user logs in and sees a paginated table: one row per device, ~20 columns (temperature, voltage, pressure, status, firmware, and so on). They can filter on any column.

That sounds standard. Here is the catch:

  • A single user may access 1–2 lakh devices (100k–200k).
  • Device data is write-heavy — millions of new documents per hour.
  • Ownership is dynamic — access is granted and revoked frequently.
  • Filters can target any of those columns, living in MongoDB, ClickHouse, or Elastic.

That combination — scale, dynamic ownership, multi-database reads, arbitrary filters — is what makes the problem genuinely hard.

Approach 1: The naive $in clause

The first instinct is natural: load the user’s device IDs from SQL, then filter MongoDB.

// Step 1 — device IDs for this user from SQL
const deviceIds = await sql.query(
  `SELECT device_id FROM user_devices WHERE user_id = ?`,
  [userId]
);

// Step 2 — filter MongoDB
db.device_data.find({
  device_id: { $in: deviceIds },
  temperature: { $gt: 30 },
});

This works in development when a test user has a handful of devices. It struggles in the low thousands and collapses around 1 lakh.

Why it breaks

  • The $in array becomes a 100,000-element payload on every request.
  • MongoDB still pays a heavy cost to resolve a huge $in against a large collection, even with an index on device_id.
  • The same pattern hurts ClickHouse and Elasticsearch — every store sees the same giant list, on every page, filter change, and sort.

Verdict: Fine up to roughly 1,000 devices. Beyond that, treat it as a prototype pattern only.

Approach 2: Store user_ids inside each document

If cross-DB hops are painful, why not embed ownership in the telemetry document?

{
  "device_id": "abc123",
  "user_ids": ["user_1", "user_456", "user_789"],
  "temperature": 32,
  "voltage": 4.5,
  "timestamp": "2025-04-05T10:00:00Z"
}

Query becomes a single MongoDB operation:

db.device_data.find({
  user_ids: "user_456",
  temperature: { $gt: 30 },
});

Why it breaks

  • Storage bloat: High-frequency telemetry means enormous document counts. Duplicating user_ids on every document multiplies cost brutally when devices are shared across many users.
  • Update nightmare: Granting access means patching historical documents — potentially tens of millions of updates for one grant.
  • Revocation is worse: You must remove the user everywhere, and access is not truly revoked until every copy is consistent — a security and ops trap.

Verdict: Reasonable for write-once data with stable ownership. Avoid for high-velocity IoT with dynamic sharing.

Approach 3: Redis cache + chunked parallel queries

Keep ownership out of each document, but split the giant $in into parallel chunks.

const allDeviceIds = await redis.smembers(`user:${userId}:devices`);
const chunks = chunkArray(allDeviceIds, 500);

const results = await Promise.all(
  chunks.map((chunk) =>
    db.device_data.find({
      device_id: { $in: chunk },
      temperature: { $gt: 30 },
    })
  )
);

const merged = results.flat().sort(/* ... */).slice(page * size, (page + 1) * size);

Why it breaks

  • Fan-out: 100k IDs at 500 per chunk ⇒ 200 concurrent queries per user interaction — connection pools and DB capacity suffer.
  • Pagination is wrong in principle: True “global page 1” needs ordering across all matches; chunk results do not give you that without materializing everything.
  • Cache sync: Redis must stay aligned with SQL on every assignment change.

Verdict: Better than one mega-$in, still too expensive at this scale. Only consider when device counts are in the low thousands and approximate pagination is acceptable.

Approach 4: Separate mapping table in each database

Mirror user_devices inside MongoDB, ClickHouse, and Elasticsearch so each store can join locally.

MongoDB-style sketch:

db.device_data.aggregate([
  {
    $lookup: {
      from: "user_device_mapping",
      localField: "device_id",
      foreignField: "device_id",
      pipeline: [{ $match: { user_id: "user_456" } }],
      as: "access",
    },
  },
  { $match: { access: { $ne: [] }, temperature: { $gt: 30 } } },
]);

ClickHouse sketch:

SELECT d.*
FROM device_data d
INNER JOIN user_device_mapping m ON d.device_id = m.device_id
WHERE m.user_id = 'user_456'
  AND d.temperature > 30;

What it fixes: No cross-DB round trip for ownership; filters compose after the join.

Why it still breaks

  • Join cost at scale: Huge mapping sets × huge telemetry sets is expensive. MongoDB $lookup at lakh scale is especially painful.
  • Sync × N: You now maintain truth in SQL and three more systems. Any lag ⇒ wrong devices shown.
  • Trees break the model: Real platforms often grant access at a node in a hierarchy (“everything under India”), not a flat precomputed device list that stays still while the fleet changes.

Verdict: Looks right in staging; production volume and tree-based ACL expose the limits.

The tree problem: access control gets harder

So far we assumed users map to devices. In real IoT products, users map to nodes in a hierarchy — and inherit the whole subtree.

Example (names are not unique; only IDs are):

World ├── India (N2) │ ├── Maharashtra (N8) │ │ └── Pune (N4) │ │ ├── Factory A (N11) │ │ │ ├── Device D1 │ │ │ └── Device D2 │ │ └── Device D3 │ └── Delhi (N5) │ └── Device D4 └── India (N3) ← same label, different node └── Gujarat (N9) └── Pune (N6) ← also “Pune”, different parent └── Device D5

Assign a user to N2 and they should see D1–D4, including future devices under that subtree. Moves and inserts change who should see what without a simple static device list.

The hard question: given node N2, how do you resolve all devices in the subtree efficiently when the tree has millions of nodes and structure changes multiple times per hour?

Approach 5: Closure table

Store every ancestor → descendant pair. Subtree membership becomes one indexed lookup.

SELECT descendant_id
FROM closure_table
WHERE ancestor_id = 'N2'
  AND is_device = true;

Why it breaks at this scale

  • A subtree move can touch hundreds of thousands of pairs.
  • Frequent structural changes create continuous heavy writes competing with reads.
  • At millions of nodes, the closure table itself can become enormous.

Verdict: Great when the tree is slow-changing. Costly when large subtrees move often. Keep parent/child as source of truth; closure is derived.

Approach 6: Materialized path

Store a string path per node, e.g. /N1/N2/N4/D1. Subtree query:

SELECT id FROM nodes
WHERE path LIKE '/N1/N2/%'
  AND is_device = true;

Moves use REPLACE on descendant paths.

Verdict: Simpler updates than closure. Prefix LIKE / anchored regex still degrades when the matching subtree is huge. Good medium scale; know the limits.

Approach 7: Graph database (Neo4j)

Traversal is what graphs are for:

MATCH (user:User { id: 'user_123' })
      -[:ASSIGNED_TO]->(n:Node)
      -[:ANCESTOR_OF*]->(d:Device)
RETURN d.id;

What it solves: Deep trees, frequent edge changes, ID-based disambiguation.

What it does not solve: You still get device IDs — and if that list is 1 lakh, downstream MongoDB / ClickHouse / Elastic still face the $in explosion unless you change the filtering strategy.

Verdict: Strong for access resolution. Not a substitute for a cross-store filtering design.

The pattern that works: ancestor nodes at ingestion time

Shift work left — to write time. When a device emits a reading, the ingestion pipeline attaches full ancestry (ordered node IDs from root to device):

{
  "device_id": "D1",
  "ancestor_nodes": [
    { "id": "N1", "name": "World" },
    { "id": "N2", "name": "India" },
    { "id": "N4", "name": "Pune" },
    { "id": "N11", "name": "Factory A" }
  ],
  "temperature": 32,
  "voltage": 4.5,
  "status": "active",
  "timestamp": "2025-04-05T10:00:00Z"
}

If the user is assigned to nodes N2 and N6, the query stays tiny:

db.device_data.find({
  "ancestor_nodes.id": { $in: ["N2", "N6"] },
  temperature: { $gt: 30 },
});

Whether the user can reach 100 devices or 100,000, you still pass a handful of node IDs. No per-request tree walk. No giant device $in.

Why store id and name: Access control uses IDs. Search UIs use names — but ambiguous labels (“Pune”) must resolve through paths in the picker so the query uses N4 vs N6, not a raw string.

Tree changes: Publish NODE_MOVED (etc.) to Kafka; consumers recompute ancestry for affected subtrees and bulk-update MongoDB / Elastic asynchronously. Expect a short staleness window (seconds to minutes). For many IoT dashboards, that is acceptable; for instant revocation security domains, you need stricter (and more expensive) designs.

The ClickHouse immutability problem

ClickHouse is built for append-only analytics. You do not want to “UPDATE every row” when ancestry changes.

Practical split

StoreRole
device_latest (MongoDB)One document per device — current readings with ancestor_nodes, easy to update
device_history (ClickHouse)Append-only telemetry without embedding ancestry; filter by device_id after you know the allowed set

For analytics that must respect user→node assignments, a dictionary refreshed on a schedule can avoid mutating fact data:

CREATE DICTIONARY user_nodes_dict (
    user_id String,
    node_ids Array(String)
)
SOURCE(CLICKHOUSE(...))
LIFETIME(300);

-- Illustrative pattern — adapt to your schema
SELECT device_id, avg(temperature), max(voltage)
FROM device_history
WHERE /* access via dict + your model */
  AND timestamp > now() - INTERVAL 7 DAY
GROUP BY device_id;

End-to-end picture

Ingestion

  1. Device emits reading
  2. Resolve ancestor_nodes (e.g. Redis warmed from Neo4j)
  3. Write device_latest (MongoDB) with ancestry
  4. Write device_history (ClickHouse) append-only
  5. Index Elastic with ancestry for search

Query time

  1. Load user’s assigned node IDs (SQL / cache)
  2. Mongo / Elastic: ancestor_nodes.id IN [those nodes] + filters
  3. ClickHouse: dictionary (or resolved device set) + analytics query

Tree change

  1. Kafka event
  2. Neo4j graph update
  3. Async ancestry repair in Mongo + ES
  4. Invalidate affected caches
  5. Dictionary refresh on next cycle

Tradeoffs summary

ApproachBest forAvoid when
Naive $inUnder ~1k devices, prototypesLarge device sets
user_ids on each docStable ownership, low write ratesHigh-frequency telemetry + dynamic sharing
Redis chunkingMedium scale, flat ownershipHuge lists + correct deep pagination
Mapping table per DBSimpler models, moderate dataHeavy joins + multi-way sync + tree ACL
Closure tableSlow-changing treesFrequent large moves
Materialized pathModerate scale, simplicityMassive prefix match sets
Neo4jDynamic trees, graph truthUsed alone without filtering strategy
Ancestors at ingestProduction multi-DB + treesHard real-time revocation requirements

Honest conclusion

There is no free lunch. You trade write complexity for read simplicity, and you often accept eventual consistency at the edge.

The pattern that survives in the wild: push complexity to ingestion — pay once per document, query cheaply forever — and scope ClickHouse so immutability stays a feature, not a fight.

If you have shipped something like this, the failed approaches will look familiar. The goal is not a perfect abstraction; it is the least painful fit for your latency, security, and ops constraints.

About the author

Amit Verma is a software engineer focused on distributed systems, event-driven backends, and data-intensive products — the kind of work where SQL, document stores, search, and columnar analytics have to agree just enough to ship.

If this post saved you a week of dead ends, a share or follow goes a long way.