I have done mostly machine learning with big data, GPUs on EC2 VMs, K8S clusters etc. But this new assignment is on the other end of the scale.
Basically, it is a time series forecasting and regression problem with some body signals. The problem itself is simple enough. I developed some moderate sized transformer/LSTM models using frameworks like tensorflow and darts. But the deployment constraint says
the model has to be deployed on a low power proprietary wearable device with <50 ms latency
So the questions are
- What kind of frameworks, or programming languages do I need to support it?
- I am not an Android/IOS developer either. Suppose I wrap my model inference (using tensorflow) inside a simple python function that takes some features as arguments and spits out a prediction. So do I assume the device/firmware engineer can invoke the python function to consume my model? Or inside the device I can run a Dockerised service at a port to get the inference (this is what I have done previously in big data context)?
- What kind of interface can I use to retrain and update the model regularly? Again, the challenge is in pushing the model to the end device
- Or, in some scenarios, may I assume that the device stays internet connected, hence can just make an HTTP request to a server?
I am not sure if I am asking the right questions here, as obviously it is a big context switch from my previous set ups using big cloud services for inference. So any help, resource and standard practices will be greatly appreciated.
-
See, for example, TinyML.Steve Melnikoff– Steve Melnikoff03/01/2025 11:42:10Commented Mar 1 at 11:42
3 Answers 3
But the deployment constraint says
the model has to be deployed on a low power proprietary wearable device with <50 ms latency
My biggest piece of advise is to get more information about this wearable device. How much RAM does it have/are you allowed to use? How much storage (flash)? What operating system facilities does it have?
Based on the stated deployment constraint, I would not be surprised if the device uses an embedded OS that does not support the concept of executable files. If so, you can forget about using python or any kind of containerized concept. You can create & train your model in tensorflow, but you will have to have a specialized consumer of the model that is most likely written in C or C++.
-
So embeded containers don’t exist? I thought Python could be pre-compiled. Or even transpiled into c.candied_orange– candied_orange02/28/2025 09:12:07Commented Feb 28 at 9:12
-
3@candied_orange, Embedded systems cover a very wide range of computing power, all the way from small microprocessors with tens of kB RAM and a few hundred kB ROM to industrial PCs running a full linux distribution. As the Q states "low power wearable device", I am assuming, based on my experience, it is towards the lower end of the scale.Bart van Ingen Schenau– Bart van Ingen Schenau02/28/2025 10:18:41Commented Feb 28 at 10:18
-
The Windriver link doesn't say anything about the devices it's running on, sounds like they're just industrial PCs rather than the normal meaning of small embedded devices.pjc50– pjc5002/28/2025 13:19:31Commented Feb 28 at 13:19
-
@BartvanIngenSchenau thanks for the useful answer. I can train the model in tensorflow and compress/quantise it into a flat-buffer format following LiteRT. After that, any specific example of specialised consumer that you talk about?Della– Della03/02/2025 01:43:02Commented Mar 2 at 1:43
-
1@Della AFAICT the "specialized consumer" is LiteRT (or a similar library but I don't know what it could be)ojs– ojs03/02/2025 09:13:16Commented Mar 2 at 9:13
I'm making a wild guess that "low power proprietary wearable" means a microcontroller, and probably an ARM-based one. The good news is that there are libraries like LiteRT (formerly known as Tensorflow Lite) that are freely available and not difficult to integrate. And even better, you can probably export the model as .tflite files, give them to a C++ developer and have them integrate the library. The challenge is that it's your task to make the model small enough to run on the target device within time and memory constraints.
This kind of device wouldn't necessarily update itself. Instead there is usually a companion app that runs on a mobile phone or desktop computer that handles downloading new weights and deploys them on the device.
-
Thanks a lot. If you do not mind a small digression, from a typical job profile perspective, does integrating the tflite with the chip (taking the architecture into account) typically fall on embedded system developers? I mean it sounds very low level compared to usual machine learning engineering challenges and seems quite specialised.Della– Della03/02/2025 01:45:39Commented Mar 2 at 1:45
-
@Della it's one of those in-between tasks where you don't have to concern with really low-level details of embedded but you have to care about memory more than with normal computers. "integrating with chip" is a bit of exaggeration, it's more like integrating to rest of embedded software. I think it's normally more a software developer's than ML engineer's job but if it's a startup with only a handful of employees you might be expected to learn to do it.ojs– ojs03/02/2025 09:10:04Commented Mar 2 at 9:10
the model has to be deployed on a low power proprietary wearable device with <50 ms latency
Those are performance constraints. They have nothing to do with frameworks or programming languages. What they have to do with are your models nodes. How many you can have and how deep they can be layered.
A key thing to find out here is if the device has anything like a video card GPU capable of parallel processing or if you're stuck with a CPU. Find out how much memory is left for your code and model when it’s running its operating system.
may I assume that the device stays internet connected, hence can just make an HTTP request to a server?
Well sure, but then you're not deploying the model to a wearable device. You're deploying it to a data center. And as @pjc50 points out, 50ms latency might be ambitious:
Latency Medium 0-10ms T1 5-40ms cable internet 10-70ms DSL 100-220ms dial-up
Also, most wearables are not wired. Adding Wifi isn't going to help with the latency.
-
1Also if you're making a remote request the network will eat most of your 50ms latency budget. And may not arrive at all!pjc50– pjc5002/28/2025 13:17:51Commented Feb 28 at 13:17
-
@pjc50 well that's why we have UDP but, yeah the speed of light isn't getting any faster.candied_orange– candied_orange02/28/2025 15:11:11Commented Feb 28 at 15:11
-
-
2I'm curious what are devices that have a GPU and can be considered "low power wearable"ojs– ojs02/28/2025 16:04:15Commented Feb 28 at 16:04
Explore related questions
See similar questions with these tags.