I have the code below that I am using to thread certain tasks. Basically, you pass in a function reference, a list of data and the number of threads. It will run the function for each item in the list with the specified number of threads.
I currently have this in a separate py file that I import as needed. Performance has been kind of strange and inconsistent though. What do you guys think?
import threading
import logging
import threading
import time
from queue import Queue
def thread_proc(func,data,threads):
if threads < 0:
return "Thead Count not specified"
q = Queue()
for i in range(threads):
thread = threading.Thread(target=thread_exec,args=(func,q))
thread.daemon = True
thread.start()
for item in data:
q.put(item)
logging.debug('*** Main thread waiting')
s = q.qsize()
while s > 0:
logging.debug("Queue Size is:" + str(s))
s = q.qsize()
time.sleep(1)
logging.debug('*** Main thread waiting')
q.join()
logging.debug('*** Done')
def thread_exec(func,q):
while True:
d = q.get()
#logging.debug("Working...")
try:
func(d)
except:
pass
q.task_done()
1 Answer 1
This code very very similar to the example in the docs. As such, it looks fine, I can only suggest coding style improvements.
This is a bit strange:
def thread_proc(func,data,threads): if threads < 0: return "Thead Count not specified"
It's not really true that "thread count is not specified",
since the parameter threads
is there,
and apparently not None
.
Maybe the message should be "invalid thread count".
More importantly, didn't you mean the condition as threads < 1
?
for i in range(threads): # ...
When you don't need the iterator variable inside the loop, it's customary to call it _
. That way I won't be looking inside the loop body for a variable i
that's not there:
for _ in range(threads):
# ...
And while we talk about naming, the example in the docs used item = q.get()
, but you changed to d = q.get()
. Although item
maybe a bit too generic name, d
is definitely worse.
PEP8
You're not following PEP8. There are several violations.
Put a space after commas in parameter lists, for example instead of this:
def thread_proc(func,data,threads):
Write like this:
def thread_proc(func, data, threads):
Similarly, here:
thread = threading.Thread(target=thread_exec,args=(func,q))
Write like this:
thread = threading.Thread(target=thread_exec, args=(func,q))
And you should put two empty lines before function declarations.
multiprocessing.dummy.Pool
which is just likemultiprocessing.pool.Pool
but with threads instead of processes. \$\endgroup\$