0

I am working on an audio application, where all functions execution times in the "audio loop" need to be << 1ms. I am aware that Python is not the right programming language for this task, however, Python has come a long way and I am confident that with the right tips and tweaks I can get it to work.

Currently I am investigating how to pass data between threads and I found some weird results.

I run the following benchmark program to test the different methods:

import multiprocessing
import threading
import queue
import numpy as np
import time
SIZE = 2048
myarray1 = np.ones(SIZE)
myarray2 = np.ones(SIZE)
def test_multiprocessing(num: int, put: list, get: list):
 # init
 shared_array = multiprocessing.Array('f', SIZE, lock=True)
 for _ in range(nums):
 starttime = time.perf_counter()
 # put
 with shared_array.get_lock():
 np.copyto(np.frombuffer(shared_array.get_obj(), dtype=np.float32), myarray1)
 enddtime = time.perf_counter()
 put.append(enddtime-starttime)
 starttime = time.perf_counter()
 # get
 with shared_array.get_lock():
 np.copyto(myarray2, np.frombuffer(shared_array.get_obj(), dtype=np.float32))
 enddtime = time.perf_counter()
 get.append(enddtime-starttime)
def test_threading_copy(num: int, put: list, get: list):
 # init
 free = threading.Semaphore(value=1)
 used = threading.Semaphore(value=0)
 transfer = np.empty(SIZE)
 for _ in range(nums):
 
 starttime = time.perf_counter()
 # put
 free.acquire()
 np.copyto(transfer, myarray1)
 used.release()
 enddtime = time.perf_counter()
 put.append(enddtime-starttime)
 starttime = time.perf_counter()
 # get
 used.acquire()
 np.copyto(myarray2, transfer)
 free.release()
 enddtime = time.perf_counter()
 get.append(enddtime-starttime)
def test_queue(num: int, put: list, get: list):
 # init
 q = queue.Queue(maxsize=1)
 for _ in range(num):
 starttime = time.perf_counter()
 # put
 q.put(myarray1)
 enddtime = time.perf_counter()
 put.append(enddtime-starttime)
 
 starttime = time.perf_counter()
 # get
 myarray2 = q.get()
 enddtime = time.perf_counter()
 get.append(enddtime-starttime)
if __name__ == "__main__":
 nums = int(1e6)
 for test in [test_multiprocessing, test_threading_copy, test_queue]:
 put = []; get = []
 test(nums, put, get)
 print("results:")
 print(f"\tput_avg = {sum(put) / len(put)}")
 print(f"\tput_max = {max(put)}")
 print(f"\tget_avg = {sum(get) / len(get)}")
 print(f"\tget_max = {max(get)}")

The results on my machine look something like this:

results:
 put_avg = 3.930823400122108e-06
 put_max = 0.002819699999918157
 get_avg = 3.895689899812624e-06
 get_max = 0.0016344000000572123
results:
 put_avg = 3.603283000182273e-06
 put_max = 0.007975700000088182
 get_avg = 3.501774700153874e-06
 get_max = 0.010190099999817903
results:
 put_avg = 1.4336647000006905e-06
 put_max = 0.0008001000001058856
 get_avg = 1.2777225000797898e-06
 get_max = 0.00023200000009637733

The average time is perfectly fine for my application, but the maxima are bugging me as they are above or close to 1ms. Except for the test_queue example none of my code is allocating new memory.

  • Do you have a clue why this is happening?
  • How can I fix this, or speed up the code?
  • Are there some general Python settings to prevent this behavior?
  • How would you debug this?
asked Dec 9, 2024 at 19:51
13
  • 2
    IDK but, it can be hard to meet hard real-time requirements when your program runs on an operating system that was not designed for that purpose, and it can be hard to meet hard real-time requirements when anything else other than your application (e-mail client that periodically checks for mail, web page open to a page with scripted content, etc.) is running on the system. Commented Dec 9, 2024 at 20:30
  • 2
    You will never have guaranteed latency with any non realtime OS. Thread priority can help a bit (OS dependent). sys.setswitchinterval might be worth looking at. Busy loop polling for information takes a lot of cpu, but is often the fastest way to detect available input. Commented Dec 9, 2024 at 20:45
  • Side note: contrary to what you seem to claim, the maxima for your queue alternative are less than 1ms (0.8 ms for put; 0.2 ms for get). You can't safely rely on that, of course, but that version seems to do very well. Commented Dec 9, 2024 at 21:20
  • 1
    I am really not sure the threading version does what you think it does. Have you tried to trace a bit which thread copies things? My understanding is that the more probable sceneario is that they will copy both their own data for themselves, probably alternating (but maybe one witll remain blocked for 12 iterations, while the other copies data from one array to another in loop). Sure, since the whole point of thread is that they share their memory, and don't really need to copy data to each other, that doesn't really matter which thread "get" the data that one have "put". Commented Dec 9, 2024 at 21:25
  • 1
    But, if the idea is to have one thread reading from say an audio card (or an internet connexion, or a file) while another consumes what is read to play it/display it/whatever, maybe you should have a look as simple structures such a circular buffer, rather that making 3 copies (one in myarray1, one in myarray2, one in transfer, of data, when all threads can already read those 3 arrays) Commented Dec 9, 2024 at 21:29

1 Answer 1

1
  • Do you have a clue why this is happening?

Many things are all running on your system at the same time. The system shares available resources (CPU, memory bandwidth, device I/O) among them.

Your script does not have unconditional first priority for system resources. In fact, there are probably some occasional tasks that have higher priority when they run, and plenty of things have the same priority. It is not surprising that every once in a while, one of your transfers has to wait a comparatively long time for the system to do something else. If you need guarantees on how long your script might need to wait to perform one of its transfers, then you need to make appropriate use of the features of a real-time operating system.

  • How can I fix this, or speed up the code?

You probably cannot prevent occasional elapsed-time spikes, unless you're prepared to install an RT OS and run your code there. Details of what you would need to do are a bit too broad for an SO answer.

With sufficient system privileges, you may be able to increase the priority of a given running job. That might help some. Details depend on your OS.

The usual general answer to speeding up Python code that is not inherently inefficient is to use native code to run the slow bits.

  • Are there some general Python settings to prevent this behavior?

I don't believe so. The spikiness you observe is not Python-specific.

  • How would you debug this?

I wouldn't. What you observe doesn't seem abnormal to me.

answered Dec 9, 2024 at 22:26
Sign up to request clarification or add additional context in comments.

2 Comments

I totally agree with your reasoning on why this is happening, but I don't think the only option is to install a RT OS. There are many audio applications out there that run fast with low-latency even on windows. Of course they probably don't use Python ;-) But with the shift of Python towards C, there has to be something that can be done to improve latency? Running native code would for sure speed things up, but there is also no guarantee the OS does not interrupt the code, right?
@helixfoo, if you want performance guarantees, then in fact yes, the only plausible option is to run on a RT OS. Providing for such guarantees is exactly what distinguishes RT OSes from other OSes. Sure, there are realtime audio applications for commodity OSes. And they do not solve the problem you are asking about. Instead, they either work around it -- generally by buffering ahead of playback -- or if that's not possible then they just live with occasional audio artifacts. It's not a matter of just going faster.

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.