I have a problem, which is similar to this:
import numpy as np
C = np.zeros((100,10))
for i in range(10):
C_sub = get_sub_matrix_C(i, other_args) # shape 10x10
C[i*10:(i+1)*10,:10] = C_sub
So, apparently there is no need to run this as a serial calculation, since each submatrix can be calculated independently. I would like to use the multiprocessing module and create up to 4 processes for the for loop. I read some tutorials about multiprocessing, but wasn't able to figure out how to use this to solve my problem.
Thanks for your help
2 Answers 2
A simple way to parallelize that code would be to use a Pool of processes:
pool = multiprocessing.Pool()
results = pool.starmap(get_sub_matrix_C, ((i, other_args) for i in range(10)))
for i, res in enumerate(results):
C[i*10:(i+1)*10,:10] = res
I've used starmap since the get_sub_matrix_C function has more than one argument (starmap(f, [(x1, ..., xN)]) calls f(x1, ..., xN)).
Note however that serialization/deserialization may take significant time and space, so you may have to use a more low-level solution to avoid that overhead.
It looks like you are running an outdated version of python. You can replace starmap with plain map but then you have to provide a function that takes a single parameter:
def f(args):
return get_sub_matrix_C(*args)
pool = multiprocessing.Pool()
results = pool.map(f, ((i, other_args) for i in range(10)))
for i, res in enumerate(results):
C[i*10:(i+1)*10,:10] = res
3 Comments
map and modify the function to accept a single parameter. I've edited the answer to add this solution too.The following recipe perhaps can do the job. Feel free to ask.
import numpy as np
import multiprocessing
def processParallel():
def own_process(i, other_args, out_queue):
C_sub = get_sub_matrix_C(i, other_args)
out_queue.put(C_sub)
sub_matrices_list = []
out_queue = multiprocessing.Queue()
other_args = 0
for i in range(10):
p = multiprocessing.Process(
target=own_process,
args=(i, other_args, out_queue))
procs.append(p)
p.start()
for i in range(10):
sub_matrices_list.extend(out_queue.get())
for p in procs:
p.join()
return sub_matrices_list
C = np.zeros((100,10))
result = processParallel()
for i in range(10):
C[i*10:(i+1)*10,:10] = result[i]
get_sub_matrixis literally just a few matrix accesses you aren't going to obtain any speedup.