Fast Communication between Python Threads

Question 1

I am working on an audio application, where all functions execution times in the "audio loop" need to be << 1ms. I am aware that Python is not the right programming language for this task, however, Python has come a long way and I am confident that with the right tips and tweaks I can get it to work.

Currently I am investigating how to pass data between threads and I found some weird results.

I run the following benchmark program to test the different methods:

import multiprocessing
import threading
import queue
import numpy as np
import time
SIZE = 2048
myarray1 = np.ones(SIZE)
myarray2 = np.ones(SIZE)
def test_multiprocessing(num: int, put: list, get: list):
 # init
 shared_array = multiprocessing.Array('f', SIZE, lock=True)
 for _ in range(nums):
 starttime = time.perf_counter()
 # put
 with shared_array.get_lock():
 np.copyto(np.frombuffer(shared_array.get_obj(), dtype=np.float32), myarray1)
 enddtime = time.perf_counter()
 put.append(enddtime-starttime)
 starttime = time.perf_counter()
 # get
 with shared_array.get_lock():
 np.copyto(myarray2, np.frombuffer(shared_array.get_obj(), dtype=np.float32))
 enddtime = time.perf_counter()
 get.append(enddtime-starttime)
def test_threading_copy(num: int, put: list, get: list):
 # init
 free = threading.Semaphore(value=1)
 used = threading.Semaphore(value=0)
 transfer = np.empty(SIZE)
 for _ in range(nums):
 
 starttime = time.perf_counter()
 # put
 free.acquire()
 np.copyto(transfer, myarray1)
 used.release()
 enddtime = time.perf_counter()
 put.append(enddtime-starttime)
 starttime = time.perf_counter()
 # get
 used.acquire()
 np.copyto(myarray2, transfer)
 free.release()
 enddtime = time.perf_counter()
 get.append(enddtime-starttime)
def test_queue(num: int, put: list, get: list):
 # init
 q = queue.Queue(maxsize=1)
 for _ in range(num):
 starttime = time.perf_counter()
 # put
 q.put(myarray1)
 enddtime = time.perf_counter()
 put.append(enddtime-starttime)
 
 starttime = time.perf_counter()
 # get
 myarray2 = q.get()
 enddtime = time.perf_counter()
 get.append(enddtime-starttime)
if __name__ == "__main__":
 nums = int(1e6)
 for test in [test_multiprocessing, test_threading_copy, test_queue]:
 put = []; get = []
 test(nums, put, get)
 print("results:")
 print(f"\tput_avg = {sum(put) / len(put)}")
 print(f"\tput_max = {max(put)}")
 print(f"\tget_avg = {sum(get) / len(get)}")
 print(f"\tget_max = {max(get)}")

The results on my machine look something like this:

results:
 put_avg = 3.930823400122108e-06
 put_max = 0.002819699999918157
 get_avg = 3.895689899812624e-06
 get_max = 0.0016344000000572123
results:
 put_avg = 3.603283000182273e-06
 put_max = 0.007975700000088182
 get_avg = 3.501774700153874e-06
 get_max = 0.010190099999817903
results:
 put_avg = 1.4336647000006905e-06
 put_max = 0.0008001000001058856
 get_avg = 1.2777225000797898e-06
 get_max = 0.00023200000009637733

The average time is perfectly fine for my application, but the maxima are bugging me as they are above or close to 1ms. Except for the test_queue example none of my code is allocating new memory.

Do you have a clue why this is happening?
How can I fix this, or speed up the code?
Are there some general Python settings to prevent this behavior?
How would you debug this?

Question 2

IDK but, it can be hard to meet hard real-time requirements when your program runs on an operating system that was not designed for that purpose, and it can be hard to meet hard real-time requirements when anything else other than your application (e-mail client that periodically checks for mail, web page open to a page with scripted content, etc.) is running on the system.

Question 3

You will never have guaranteed latency with any non realtime OS. Thread priority can help a bit (OS dependent). sys.setswitchinterval might be worth looking at. Busy loop polling for information takes a lot of cpu, but is often the fastest way to detect available input.

Question 4

Side note: contrary to what you seem to claim, the maxima for your queue alternative are less than 1ms (0.8 ms for put; 0.2 ms for get). You can't safely rely on that, of course, but that version seems to do very well.

Question 5

I am really not sure the threading version does what you think it does. Have you tried to trace a bit which thread copies things? My understanding is that the more probable sceneario is that they will copy both their own data for themselves, probably alternating (but maybe one witll remain blocked for 12 iterations, while the other copies data from one array to another in loop). Sure, since the whole point of thread is that they share their memory, and don't really need to copy data to each other, that doesn't really matter which thread "get" the data that one have "put".

Question 6

But, if the idea is to have one thread reading from say an audio card (or an internet connexion, or a file) while another consumes what is read to play it/display it/whatever, maybe you should have a look as simple structures such a circular buffer, rather that making 3 copies (one in myarray1, one in myarray2, one in transfer, of data, when all threads can already read those 3 arrays)

Question 7

Do you have a clue why this is happening?

Many things are all running on your system at the same time. The system shares available resources (CPU, memory bandwidth, device I/O) among them.

Your script does not have unconditional first priority for system resources. In fact, there are probably some occasional tasks that have higher priority when they run, and plenty of things have the same priority. It is not surprising that every once in a while, one of your transfers has to wait a comparatively long time for the system to do something else. If you need guarantees on how long your script might need to wait to perform one of its transfers, then you need to make appropriate use of the features of a real-time operating system.

How can I fix this, or speed up the code?

You probably cannot prevent occasional elapsed-time spikes, unless you're prepared to install an RT OS and run your code there. Details of what you would need to do are a bit too broad for an SO answer.

With sufficient system privileges, you may be able to increase the priority of a given running job. That might help some. Details depend on your OS.

The usual general answer to speeding up Python code that is not inherently inefficient is to use native code to run the slow bits.

Are there some general Python settings to prevent this behavior?

I don't believe so. The spikiness you observe is not Python-specific.

How would you debug this?

I wouldn't. What you observe doesn't seem abnormal to me.

Question 8

I totally agree with your reasoning on why this is happening, but I don't think the only option is to install a RT OS. There are many audio applications out there that run fast with low-latency even on windows. Of course they probably don't use Python ;-) But with the shift of Python towards C, there has to be something that can be done to improve latency? Running native code would for sure speed things up, but there is also no guarantee the OS does not interrupt the code, right?

Question 9

@helixfoo, if you want performance guarantees, then in fact yes, the only plausible option is to run on a RT OS. Providing for such guarantees is exactly what distinguishes RT OSes from other OSes. Sure, there are realtime audio applications for commodity OSes. And they do not solve the problem you are asking about. Instead, they either work around it -- generally by buffering ahead of playback -- or if that's not possible then they just live with occasional audio artifacts. It's not a matter of just going faster.

John Bollinger 192k11 gold badges103 silver badges210 bronze badges · Accepted Answer · 2024-12-09 22:26:06Z

Do you have a clue why this is happening?

Many things are all running on your system at the same time. The system shares available resources (CPU, memory bandwidth, device I/O) among them.

Your script does not have unconditional first priority for system resources. In fact, there are probably some occasional tasks that have higher priority when they run, and plenty of things have the same priority. It is not surprising that every once in a while, one of your transfers has to wait a comparatively long time for the system to do something else. If you need guarantees on how long your script might need to wait to perform one of its transfers, then you need to make appropriate use of the features of a real-time operating system.

How can I fix this, or speed up the code?

You probably cannot prevent occasional elapsed-time spikes, unless you're prepared to install an RT OS and run your code there. Details of what you would need to do are a bit too broad for an SO answer.

With sufficient system privileges, you may be able to increase the priority of a given running job. That might help some. Details depend on your OS.

The usual general answer to speeding up Python code that is not inherently inefficient is to use native code to run the slow bits.

Are there some general Python settings to prevent this behavior?

I don't believe so. The spikiness you observe is not Python-specific.

How would you debug this?

I wouldn't. What you observe doesn't seem abnormal to me.

I totally agree with your reasoning on why this is happening, but I don't think the only option is to install a RT OS. There are many audio applications out there that run fast with low-latency even on windows. Of course they probably don't use Python ;-) But with the shift of Python towards C, there has to be something that can be done to improve latency? Running native code would for sure speed things up, but there is also no guarantee the OS does not interrupt the code, right?
@helixfoo, if you want performance guarantees, then in fact yes, the only plausible option is to run on a RT OS. Providing for such guarantees is exactly what distinguishes RT OSes from other OSes. Sure, there are realtime audio applications for commodity OSes. And they do not solve the problem you are asking about. Instead, they either work around it -- generally by buffering ahead of playback -- or if that's not possible then they just live with occasional audio artifacts. It's not a matter of just going faster.

CollectivesTM on Stack Overflow

Fast Communication between Python Threads

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

CollectivesTM on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related