How to Deploy Machine Learning Model on Wearable Edge Devices?

Question 1

I have done mostly machine learning with big data, GPUs on EC2 VMs, K8S clusters etc. But this new assignment is on the other end of the scale.

Basically, it is a time series forecasting and regression problem with some body signals. The problem itself is simple enough. I developed some moderate sized transformer/LSTM models using frameworks like tensorflow and darts. But the deployment constraint says

the model has to be deployed on a low power proprietary wearable device with <50 ms latency

So the questions are

What kind of frameworks, or programming languages do I need to support it?
I am not an Android/IOS developer either. Suppose I wrap my model inference (using tensorflow) inside a simple python function that takes some features as arguments and spits out a prediction. So do I assume the device/firmware engineer can invoke the python function to consume my model? Or inside the device I can run a Dockerised service at a port to get the inference (this is what I have done previously in big data context)?
What kind of interface can I use to retrain and update the model regularly? Again, the challenge is in pushing the model to the end device
Or, in some scenarios, may I assume that the device stays internet connected, hence can just make an HTTP request to a server?

I am not sure if I am asking the right questions here, as obviously it is a big context switch from my previous set ups using big cloud services for inference. So any help, resource and standard practices will be greatly appreciated.

Question 2

See, for example, TinyML.

Question 3

But the deployment constraint says

the model has to be deployed on a low power proprietary wearable device with <50 ms latency

My biggest piece of advise is to get more information about this wearable device. How much RAM does it have/are you allowed to use? How much storage (flash)? What operating system facilities does it have?

Based on the stated deployment constraint, I would not be surprised if the device uses an embedded OS that does not support the concept of executable files. If so, you can forget about using python or any kind of containerized concept. You can create & train your model in tensorflow, but you will have to have a specialized consumer of the model that is most likely written in C or C++.

Question 4

So embeded containers don’t exist? I thought Python could be pre-compiled. Or even transpiled into c.

Question 5

@candied_orange, Embedded systems cover a very wide range of computing power, all the way from small microprocessors with tens of kB RAM and a few hundred kB ROM to industrial PCs running a full linux distribution. As the Q states "low power wearable device", I am assuming, based on my experience, it is towards the lower end of the scale.

Question 6

The Windriver link doesn't say anything about the devices it's running on, sounds like they're just industrial PCs rather than the normal meaning of small embedded devices.

Question 7

@BartvanIngenSchenau thanks for the useful answer. I can train the model in tensorflow and compress/quantise it into a flat-buffer format following LiteRT. After that, any specific example of specialised consumer that you talk about?

Question 8

@Della AFAICT the "specialized consumer" is LiteRT (or a similar library but I don't know what it could be)

Question 9

I'm making a wild guess that "low power proprietary wearable" means a microcontroller, and probably an ARM-based one. The good news is that there are libraries like LiteRT (formerly known as Tensorflow Lite) that are freely available and not difficult to integrate. And even better, you can probably export the model as .tflite files, give them to a C++ developer and have them integrate the library. The challenge is that it's your task to make the model small enough to run on the target device within time and memory constraints.

This kind of device wouldn't necessarily update itself. Instead there is usually a companion app that runs on a mobile phone or desktop computer that handles downloading new weights and deploys them on the device.

Question 10

Thanks a lot. If you do not mind a small digression, from a typical job profile perspective, does integrating the tflite with the chip (taking the architecture into account) typically fall on embedded system developers? I mean it sounds very low level compared to usual machine learning engineering challenges and seems quite specialised.

Question 11

@Della it's one of those in-between tasks where you don't have to concern with really low-level details of embedded but you have to care about memory more than with normal computers. "integrating with chip" is a bit of exaggeration, it's more like integrating to rest of embedded software. I think it's normally more a software developer's than ML engineer's job but if it's a startup with only a handful of employees you might be expected to learn to do it.

Question 12

the model has to be deployed on a low power proprietary wearable device with <50 ms latency

Those are performance constraints. They have nothing to do with frameworks or programming languages. What they have to do with are your models nodes. How many you can have and how deep they can be layered.

A key thing to find out here is if the device has anything like a video card GPU capable of parallel processing or if you're stuck with a CPU. Find out how much memory is left for your code and model when it’s running its operating system.

may I assume that the device stays internet connected, hence can just make an HTTP request to a server?

Well sure, but then you're not deploying the model to a wearable device. You're deploying it to a data center. And as @pjc50 points out, 50ms latency might be ambitious:

Latency Medium

0-10ms T1

5-40ms cable internet

10-70ms DSL

100-220ms dial-up

pingplotter.com

Also, most wearables are not wired. Adding Wifi isn't going to help with the latency.

Question 13

Also if you're making a remote request the network will eat most of your 50ms latency budget. And may not arrive at all!

Question 14

@pjc50 well that's why we have UDP but, yeah the speed of light isn't getting any faster.

Question 15

@pjc50 better now?

Question 16

I'm curious what are devices that have a GPU and can be considered "low power wearable"

score 6 · Answer 1 · 2025-02-28 08:11:24Z

6

But the deployment constraint says

the model has to be deployed on a low power proprietary wearable device with <50 ms latency

My biggest piece of advise is to get more information about this wearable device. How much RAM does it have/are you allowed to use? How much storage (flash)? What operating system facilities does it have?

Based on the stated deployment constraint, I would not be surprised if the device uses an embedded OS that does not support the concept of executable files. If so, you can forget about using python or any kind of containerized concept. You can create & train your model in tensorflow, but you will have to have a specialized consumer of the model that is most likely written in C or C++.

Share

Improve this answer

answered Feb 28 at 8:11

Bart van Ingen Schenau's user avatar

Bart van Ingen Schenau Bart van Ingen Schenau

78.8k20 gold badges129 silver badges196 bronze badges

5

So embeded containers don’t exist? I thought Python could be pre-compiled. Or even transpiled into c.

candied_orange
– candied_orange

02/28/2025 09:12:07
Commented Feb 28 at 9:12
3

@candied_orange, Embedded systems cover a very wide range of computing power, all the way from small microprocessors with tens of kB RAM and a few hundred kB ROM to industrial PCs running a full linux distribution. As the Q states "low power wearable device", I am assuming, based on my experience, it is towards the lower end of the scale.

Bart van Ingen Schenau
– Bart van Ingen Schenau

02/28/2025 10:18:41
Commented Feb 28 at 10:18
The Windriver link doesn't say anything about the devices it's running on, sounds like they're just industrial PCs rather than the normal meaning of small embedded devices.

pjc50
– pjc50

02/28/2025 13:19:31
Commented Feb 28 at 13:19
@BartvanIngenSchenau thanks for the useful answer. I can train the model in tensorflow and compress/quantise it into a flat-buffer format following LiteRT. After that, any specific example of specialised consumer that you talk about?

Della
– Della

03/02/2025 01:43:02
Commented Mar 2 at 1:43
1

@Della AFAICT the "specialized consumer" is LiteRT (or a similar library but I don't know what it could be)

ojs
– ojs

03/02/2025 09:13:16
Commented Mar 2 at 9:13

Add a comment |

ojs ojs 2071 silver badge7 bronze badges · Answer 2 · 2025-02-28 15:52:48Z

I'm making a wild guess that "low power proprietary wearable" means a microcontroller, and probably an ARM-based one. The good news is that there are libraries like LiteRT (formerly known as Tensorflow Lite) that are freely available and not difficult to integrate. And even better, you can probably export the model as .tflite files, give them to a C++ developer and have them integrate the library. The challenge is that it's your task to make the model small enough to run on the target device within time and memory constraints.

This kind of device wouldn't necessarily update itself. Instead there is usually a companion app that runs on a mobile phone or desktop computer that handles downloading new weights and deploys them on the device.

Thanks a lot. If you do not mind a small digression, from a typical job profile perspective, does integrating the tflite with the chip (taking the architecture into account) typically fall on embedded system developers? I mean it sounds very low level compared to usual machine learning engineering challenges and seems quite specialised.
@Della it's one of those in-between tasks where you don't have to concern with really low-level details of embedded but you have to care about memory more than with normal computers. "integrating with chip" is a bit of exaggeration, it's more like integrating to rest of embedded software. I think it's normally more a software developer's than ML engineer's job but if it's a startup with only a handful of employees you might be expected to learn to do it.

score 3 · Answer 3 · 2025-02-28 02:44:41Z

the model has to be deployed on a low power proprietary wearable device with <50 ms latency

Those are performance constraints. They have nothing to do with frameworks or programming languages. What they have to do with are your models nodes. How many you can have and how deep they can be layered.

A key thing to find out here is if the device has anything like a video card GPU capable of parallel processing or if you're stuck with a CPU. Find out how much memory is left for your code and model when it’s running its operating system.

may I assume that the device stays internet connected, hence can just make an HTTP request to a server?

Well sure, but then you're not deploying the model to a wearable device. You're deploying it to a data center. And as @pjc50 points out, 50ms latency might be ambitious:

Latency Medium

0-10ms T1

5-40ms cable internet

10-70ms DSL

100-220ms dial-up

pingplotter.com

Also, most wearables are not wired. Adding Wifi isn't going to help with the latency.

Also if you're making a remote request the network will eat most of your 50ms latency budget. And may not arrive at all!
@pjc50 well that's why we have UDP but, yeah the speed of light isn't getting any faster.
I'm curious what are devices that have a GPU and can be considered "low power wearable"

Stack Exchange Network

How to Deploy Machine Learning Model on Wearable Edge Devices?

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

How to Deploy Machine Learning Model on Wearable Edge Devices?

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions