I am working on an audio application, where all functions execution times in the "audio loop" need to be << 1ms. I am aware that Python is not the right programming language for this task, however, Python has come a long way and I am confident that with the right tips and tweaks I can get it to work.
Currently I am investigating how to pass data between threads and I found some weird results.
I run the following benchmark program to test the different methods:
import multiprocessing
import threading
import queue
import numpy as np
import time
SIZE = 2048
myarray1 = np.ones(SIZE)
myarray2 = np.ones(SIZE)
def test_multiprocessing(num: int, put: list, get: list):
# init
shared_array = multiprocessing.Array('f', SIZE, lock=True)
for _ in range(nums):
starttime = time.perf_counter()
# put
with shared_array.get_lock():
np.copyto(np.frombuffer(shared_array.get_obj(), dtype=np.float32), myarray1)
enddtime = time.perf_counter()
put.append(enddtime-starttime)
starttime = time.perf_counter()
# get
with shared_array.get_lock():
np.copyto(myarray2, np.frombuffer(shared_array.get_obj(), dtype=np.float32))
enddtime = time.perf_counter()
get.append(enddtime-starttime)
def test_threading_copy(num: int, put: list, get: list):
# init
free = threading.Semaphore(value=1)
used = threading.Semaphore(value=0)
transfer = np.empty(SIZE)
for _ in range(nums):
starttime = time.perf_counter()
# put
free.acquire()
np.copyto(transfer, myarray1)
used.release()
enddtime = time.perf_counter()
put.append(enddtime-starttime)
starttime = time.perf_counter()
# get
used.acquire()
np.copyto(myarray2, transfer)
free.release()
enddtime = time.perf_counter()
get.append(enddtime-starttime)
def test_queue(num: int, put: list, get: list):
# init
q = queue.Queue(maxsize=1)
for _ in range(num):
starttime = time.perf_counter()
# put
q.put(myarray1)
enddtime = time.perf_counter()
put.append(enddtime-starttime)
starttime = time.perf_counter()
# get
myarray2 = q.get()
enddtime = time.perf_counter()
get.append(enddtime-starttime)
if __name__ == "__main__":
nums = int(1e6)
for test in [test_multiprocessing, test_threading_copy, test_queue]:
put = []; get = []
test(nums, put, get)
print("results:")
print(f"\tput_avg = {sum(put) / len(put)}")
print(f"\tput_max = {max(put)}")
print(f"\tget_avg = {sum(get) / len(get)}")
print(f"\tget_max = {max(get)}")
The results on my machine look something like this:
results:
put_avg = 3.930823400122108e-06
put_max = 0.002819699999918157
get_avg = 3.895689899812624e-06
get_max = 0.0016344000000572123
results:
put_avg = 3.603283000182273e-06
put_max = 0.007975700000088182
get_avg = 3.501774700153874e-06
get_max = 0.010190099999817903
results:
put_avg = 1.4336647000006905e-06
put_max = 0.0008001000001058856
get_avg = 1.2777225000797898e-06
get_max = 0.00023200000009637733
The average time is perfectly fine for my application, but the maxima are bugging me as they are above or close to 1ms. Except for the test_queue example none of my code is allocating new memory.
- Do you have a clue why this is happening?
- How can I fix this, or speed up the code?
- Are there some general Python settings to prevent this behavior?
- How would you debug this?
1 Answer 1
- Do you have a clue why this is happening?
Many things are all running on your system at the same time. The system shares available resources (CPU, memory bandwidth, device I/O) among them.
Your script does not have unconditional first priority for system resources. In fact, there are probably some occasional tasks that have higher priority when they run, and plenty of things have the same priority. It is not surprising that every once in a while, one of your transfers has to wait a comparatively long time for the system to do something else. If you need guarantees on how long your script might need to wait to perform one of its transfers, then you need to make appropriate use of the features of a real-time operating system.
- How can I fix this, or speed up the code?
You probably cannot prevent occasional elapsed-time spikes, unless you're prepared to install an RT OS and run your code there. Details of what you would need to do are a bit too broad for an SO answer.
With sufficient system privileges, you may be able to increase the priority of a given running job. That might help some. Details depend on your OS.
The usual general answer to speeding up Python code that is not inherently inefficient is to use native code to run the slow bits.
- Are there some general Python settings to prevent this behavior?
I don't believe so. The spikiness you observe is not Python-specific.
- How would you debug this?
I wouldn't. What you observe doesn't seem abnormal to me.
2 Comments
Explore related questions
See similar questions with these tags.
sys.setswitchintervalmight be worth looking at. Busy loop polling for information takes a lot of cpu, but is often the fastest way to detect available input.threadingversion does what you think it does. Have you tried to trace a bit which thread copies things? My understanding is that the more probable sceneario is that they will copy both their own data for themselves, probably alternating (but maybe one witll remain blocked for 12 iterations, while the other copies data from one array to another in loop). Sure, since the whole point of thread is that they share their memory, and don't really need to copy data to each other, that doesn't really matter which thread "get" the data that one have "put".