Skip to main content
Software Engineering

You are not logged in. Your edit will be placed in a queue until it is peer reviewed.

We welcome edits that make the post easier to understand and more valuable for readers. Because community members review edits, please try to make the post substantially better than how you found it, for example, by fixing grammar or adding additional resources and hyperlinks.

Required fields*

How should we design an IoT platform that handles dynamic device schemas and time-series ingestion at scale (100K writes/min)? [closed]

We’re a small dev team (3 full-stack web devs + 1 mobile dev) working on a B2B IoT monitoring platform for an industrial energy component manufacturer. Think: batteries, inverters, chargers. We have 3 device types now, with plans to support 6–7 soon.

We're building:

  • A minimalist mobile app for clients (React Native)
  • A web dashboard for internal teams (Next.js)
  • An admin panel for system control

Load Characteristics

  • ~100,000 devices sending data every minute
  • Message size: 100–500 bytes
  • Time-series data that needs long-term storage
  • Real-time updates needed for dashboards
  • Multi-tenancy — clients can only view their own devices
  • We prefer self-hosted infrastructure for cost control

Current Stack Consideration

  • Backend: Node.js + TypeScript + Express
  • Frontend: Next.js + TypeScript
  • Mobile: React Native
  • Queue: Redis + Bull or RabbitMQ
  • Database: MongoDB (self-hosted) vs TimescaleDB + PostgreSQL
  • Hosting: Self-hosted VPS vs Dedicated Server
  • Tools: PM2, nginx, Cloudflare, Coolify (deployments), Kubernetes (maybe, later)

The Two Major Questions We're Facing:

1. MongoDB vs TimescaleDB for dynamic IoT schemas and time-series ingestion? We need to store incoming data with flexible schemas (new product types have different fields), but also support efficient time-series querying (e.g., trends, performance over time).

  • MongoDB seems flexible schema-wise, but might fall short on time-series performance.
  • TimescaleDB has strong time-series support but feels more rigid schema-wise.
  • Is there a proven pattern or hybrid approach that allows schema flexibility and good time-series performance?

2. How to structure ingestion for 100K writes/min while supporting schema evolution? We’re worried about bottlenecks and future pain if we handle ingestion, schema evolution, and querying in one system.

  • Should we decouple ingestion (e.g., raw JSON into a write-optimized store), then transform/normalize later?
  • How do we avoid breaking the system every time a new product with a new schema is introduced?
  • We’ve also considered storing a "data blob" per device and extracting fields on-demand — not sure if that scales.

Additional Sub-Questions: (Feel free to address any of these if they fall into your expertise area)

  • RabbitMQ vs Kafka — Is Kafka worth adopting now or premature for our stage?
  • Real-time updates — Any architectural patterns that work well at this scale? (Polling, WebSockets, SSE?)
  • Multi-tenancy — Best-practice for securely scoping data per client in both DB and APIs?
  • Queue consumers — Should we custom-load-balance our job consumers or rely on built-in scaling?
  • VPS sizing — Any heuristics for choosing VPS sizes for this workload? When to go dedicated?
  • DevOps automation — We’re small. What lightweight CI/CD or IaC tools would you suggest? (Currently using Coolify)
  • Any known bottlenecks, security traps, or reliability pitfalls from similar projects?

We're still early in the build phase and want to make smart decisions upfront. If any of you have dealt with similar problems in IoT, real-time dashboards, or large-scale data ingestion — your advice would mean a lot.

Thanks!

Answer*

Draft saved
Draft discarded
Cancel
8
  • 1
    "concurrent connections to a single central data store"—it doesn't need to be single, as sharding is a perfectly viable option here. Also, for example SQL Server limit in terms of concurrent connections is 32 767. Far from the "few dozen" connections you mentioned. Finally, the author never told there are 100,000 connections at the same time. It may be that there are 100,000 requests per minute (although it's not clear from the original question). If every request takes 10 ms. to process, that's 17 concurrent connections. Commented Aug 4 at 21:54
  • @ArseniMourzenko, that's just not my experience with these things. The assumption that 100,000 requests per minute will be monotonic and evenly distributed over the entire minute is not justifiable, and a request that takes 10ms under ideal conditions might be blocked for just a second - you now have a backlog of over 1,600 concurrent connections waiting, on a system reckoned to handle just 17 at once. A server or system that has just an hour of downtime, on resuming could face 100,000 concurrent connections each maybe pushing 60 times more payload than normal. Commented Aug 4 at 23:08
  • Those are indeed well known problems. Check the term "back pressure" regarding one way to solve them. In short, the side that sends data has a feedback mechanism that tells that the other side is ready to continue to receive. You will get information loss (quite expected anyway during downtime), but at least you won't crush the system that receives data. Commented Aug 5 at 6:21
  • I agree with steve, 100k per min can be handled, but you know some customer is going to say "ok now do my 1 mil smart lightbulbs, oh and an I need per second accuracy" and its going to fall over. You need some sort of local, or near local collection and batching Commented Aug 5 at 8:05
  • @ArseniMourzenko, I think with this, it's a case of fools rush in where angels fear to tread. It's not the laws of physics that stop these applications, it's the misapprehension of overall complexity involved, and the amount of resources and expertise available for a solution. Commented Aug 5 at 13:16

AltStyle によって変換されたページ (->オリジナル) /