Edit - Software Engineering Stack Exchange

You are not logged in. Your edit will be placed in a queue until it is peer reviewed.

We welcome edits that make the post easier to understand and more valuable for readers. Because community members review edits, please try to make the post substantially better than how you found it, for example, by fixing grammar or adding additional resources and hyperlinks.

Required fields*

How should we design an IoT platform that handles dynamic device schemas and time-series ingestion at scale (100K writes/min)? [closed]

We’re a small dev team (3 full-stack web devs + 1 mobile dev) working on a B2B IoT monitoring platform for an industrial energy component manufacturer. Think: batteries, inverters, chargers. We have 3 device types now, with plans to support 6–7 soon.

We're building:

A minimalist mobile app for clients (React Native)
A web dashboard for internal teams (Next.js)
An admin panel for system control

Load Characteristics

~100,000 devices sending data every minute
Message size: 100–500 bytes
Time-series data that needs long-term storage
Real-time updates needed for dashboards
Multi-tenancy — clients can only view their own devices
We prefer self-hosted infrastructure for cost control

Current Stack Consideration

Backend: Node.js + TypeScript + Express
Frontend: Next.js + TypeScript
Mobile: React Native
Queue: Redis + Bull or RabbitMQ
Database: MongoDB (self-hosted) vs TimescaleDB + PostgreSQL
Hosting: Self-hosted VPS vs Dedicated Server
Tools: PM2, nginx, Cloudflare, Coolify (deployments), Kubernetes (maybe, later)

The Two Major Questions We're Facing:

1. MongoDB vs TimescaleDB for dynamic IoT schemas and time-series ingestion? We need to store incoming data with flexible schemas (new product types have different fields), but also support efficient time-series querying (e.g., trends, performance over time).

MongoDB seems flexible schema-wise, but might fall short on time-series performance.
TimescaleDB has strong time-series support but feels more rigid schema-wise.
Is there a proven pattern or hybrid approach that allows schema flexibility and good time-series performance?

2. How to structure ingestion for 100K writes/min while supporting schema evolution? We’re worried about bottlenecks and future pain if we handle ingestion, schema evolution, and querying in one system.

Should we decouple ingestion (e.g., raw JSON into a write-optimized store), then transform/normalize later?
How do we avoid breaking the system every time a new product with a new schema is introduced?
We’ve also considered storing a "data blob" per device and extracting fields on-demand — not sure if that scales.

Additional Sub-Questions: (Feel free to address any of these if they fall into your expertise area)

RabbitMQ vs Kafka — Is Kafka worth adopting now or premature for our stage?
Real-time updates — Any architectural patterns that work well at this scale? (Polling, WebSockets, SSE?)
Multi-tenancy — Best-practice for securely scoping data per client in both DB and APIs?
Queue consumers — Should we custom-load-balance our job consumers or rely on built-in scaling?
VPS sizing — Any heuristics for choosing VPS sizes for this workload? When to go dedicated?
DevOps automation — We’re small. What lightweight CI/CD or IaC tools would you suggest? (Currently using Coolify)
Any known bottlenecks, security traps, or reliability pitfalls from similar projects?

We're still early in the build phase and want to make smart decisions upfront. If any of you have dealt with similar problems in IoT, real-time dashboards, or large-scale data ingestion — your advice would mean a lot.

Thanks!

Answer*

> ~100,000 devices sending data every minute

I think your main problems are likely to arise in this area.

100k different concurrent connections to a single central data store, with 100k reliable writes a minute sustained, is something that sounds improbable.

My experience of seeing real-time data collected from unattended hardware devices is that sustaining a few dozen simultaneous connections can be problematic, and a maintenance man was occupied full-time dealing with a few hundred.

Your average web developer is probably coping with a website with an average of less than one concurrent user a minute, and probably no more than a few hundred concurrent (typically human) users at peak times. And for the tricky stuff this entails, he's probably using standard technologies and public infrastructure within ordinary assumptions (assumptions that definitely don't fit your application).

*One hundred thousand* concurrent connections, producing writes mechanically every minute, 24/7/365? That seems stratospheric in relation to my frame of reference. Personally I don't think it's commensurate with a four-man web+mobile development team.

The range of potential circumstances is too great to address even a small fraction of possible issues that could be in play, but I think at the very least your system would require something to concentrate the many connections and consolidate data at multiple levels, so that the fan-out at each level is far less than 100,000-to-1, and the number of concurrent writes at the central store is far less than 100k/min sustained.

Draft saved

Draft discarded

Edit Summary*

Cancel

1

"concurrent connections to a single central data store"—it doesn't need to be single, as sharding is a perfectly viable option here. Also, for example SQL Server limit in terms of concurrent connections is 32 767. Far from the "few dozen" connections you mentioned. Finally, the author never told there are 100,000 connections at the same time. It may be that there are 100,000 requests per minute (although it's not clear from the original question). If every request takes 10 ms. to process, that's 17 concurrent connections.

Arseni Mourzenko
– Arseni Mourzenko

08/04/2025 21:54:05
Commented Aug 4 at 21:54
@ArseniMourzenko, that's just not my experience with these things. The assumption that 100,000 requests per minute will be monotonic and evenly distributed over the entire minute is not justifiable, and a request that takes 10ms under ideal conditions might be blocked for just a second - you now have a backlog of over 1,600 concurrent connections waiting, on a system reckoned to handle just 17 at once. A server or system that has just an hour of downtime, on resuming could face 100,000 concurrent connections each maybe pushing 60 times more payload than normal.

Steve
– Steve

08/04/2025 23:08:11
Commented Aug 4 at 23:08
Those are indeed well known problems. Check the term "back pressure" regarding one way to solve them. In short, the side that sends data has a feedback mechanism that tells that the other side is ready to continue to receive. You will get information loss (quite expected anyway during downtime), but at least you won't crush the system that receives data.

Arseni Mourzenko
– Arseni Mourzenko

08/05/2025 06:21:07
Commented Aug 5 at 6:21
I agree with steve, 100k per min can be handled, but you know some customer is going to say "ok now do my 1 mil smart lightbulbs, oh and an I need per second accuracy" and its going to fall over. You need some sort of local, or near local collection and batching

Ewan
– Ewan

08/05/2025 08:05:04
Commented Aug 5 at 8:05
@ArseniMourzenko, I think with this, it's a case of fools rush in where angels fear to tread. It's not the laws of physics that stop these applications, it's the misapprehension of overall complexity involved, and the amount of resources and expertise available for a solution.

Steve
– Steve

08/05/2025 13:16:15
Commented Aug 5 at 13:16

| Show 3 more comments

How to Edit

Correct minor typos or mistakes
Clarify meaning without changing it
Add related resources or links
Always respect the author’s intent
Don’t use edits to reply to the author

How to Format

create code fences with backticks ` or tildes ~
```
like so
```
add language identifier to highlight code
```python
def function(foo):
print(foo)
```
put returns between paragraphs
for linebreak add 2 spaces at end
_italic_ or **bold**
indent code by 4 spaces
backtick escapes `like _so_`
quote by placing > at start of line
to make links (use https whenever possible)

<https://example.com>

[example](https://example.com)

<a href="https://example.com">example</a>

formatting help »
answering help »

How to Tag

A tag is a keyword or label that categorizes your question with other, similar questions. Choose one or more (up to 5) tags that will help answerers to find and interpret your question.

complete the sentence: my question is about...
use tags that describe things or concepts that are essential, not incidental to your question
favor using existing popular tags
read the descriptions that appear below the tag

If your question is primarily about a topic for which you can't find a tag:

combine multiple words into single-words with hyphens (e.g. design-patterns), up to a maximum of 35 characters
creating new tags is a privilege; if you can't yet create a tag you need, then post this question without it, then ask the community to create it for you

popular tags »