Python multiprocessing and shared numpy array

Question 1

I have a problem, which is similar to this:

import numpy as np
C = np.zeros((100,10))
for i in range(10):
 C_sub = get_sub_matrix_C(i, other_args) # shape 10x10
 C[i*10:(i+1)*10,:10] = C_sub

So, apparently there is no need to run this as a serial calculation, since each submatrix can be calculated independently. I would like to use the multiprocessing module and create up to 4 processes for the for loop. I read some tutorials about multiprocessing, but wasn't able to figure out how to use this to solve my problem.

Thanks for your help

Question 2

In order for multiprocessing to yield performance improvement the computations must take significant time. Because multiprocessing is going to serialize the data, send it to the subprocesses, deserialize it and perform the computations, serialize the result, send it back to the main process and finally deserialize it. Serialization/deserialization take quite some time plus inter-process communication isn't that fast too. If get_sub_matrix is literally just a few matrix accesses you aren't going to obtain any speedup.

Question 3

This is just for illustration purpose. In the end my matrix will have dimensions about 100000 x 20000, but what is more important the get_sub_matrix_C is kind of slow and I think I cant make that function any faster.

Question 4

Does get_sub_matrix_C need to access all the matrix or just the submatrix? because, if it need it all, the serialization of one copy of the big matrix for each subproccess will be very time and memory consuming.

Question 5

Actually, get_sub_matrix_C doesn't depend on any entries of C. It just gives the submatrix that I want to write in C, where i determines the "position".

Question 6

A simple way to parallelize that code would be to use a Pool of processes:

pool = multiprocessing.Pool()
results = pool.starmap(get_sub_matrix_C, ((i, other_args) for i in range(10)))
for i, res in enumerate(results):
 C[i*10:(i+1)*10,:10] = res

I've used starmap since the get_sub_matrix_C function has more than one argument (starmap(f, [(x1, ..., xN)]) calls f(x1, ..., xN)).

Note however that serialization/deserialization may take significant time and space, so you may have to use a more low-level solution to avoid that overhead.

It looks like you are running an outdated version of python. You can replace starmap with plain map but then you have to provide a function that takes a single parameter:

def f(args):
 return get_sub_matrix_C(*args)
pool = multiprocessing.Pool()
results = pool.map(f, ((i, other_args) for i in range(10)))
for i, res in enumerate(results):
 C[i*10:(i+1)*10,:10] = res

Question 7

Thanks for your answer. Unfortunately I can't test it, since I don't have starmap. Probably I'm using an outdated version of multiprocessing? Version: 0.70a1

Question 8

@RoSt You can use map and modify the function to accept a single parameter. I've edited the answer to add this solution too.

Question 9

Thanks for the easy and straightforward solution. It works fine. I would vote you up, but my own reputation is <15, sorry...

Question 10

The following recipe perhaps can do the job. Feel free to ask.

import numpy as np
import multiprocessing
def processParallel():
 def own_process(i, other_args, out_queue):
 C_sub = get_sub_matrix_C(i, other_args)
 out_queue.put(C_sub) 
 sub_matrices_list = []
 out_queue = multiprocessing.Queue()
 other_args = 0
 for i in range(10):
 p = multiprocessing.Process(
 target=own_process,
 args=(i, other_args, out_queue))
 procs.append(p)
 p.start()
 for i in range(10):
 sub_matrices_list.extend(out_queue.get())
 for p in procs:
 p.join()
 return sub_matrices_list 
C = np.zeros((100,10))
result = processParallel()
for i in range(10):
 C[i*10:(i+1)*10,:10] = result[i]

Question 11

Thanks for your answer. I tried it, but I got confusing results. The same entries were repeated over and over again.

Question 12

I just corrected the bug, sorry. Anyway, the other answer seems more succinct and practical. I will try it myself too! :)

Bakuriu 103k23 gold badges209 silver badges236 bronze badges · Accepted Answer · 2016-03-08 12:59:24Z

A simple way to parallelize that code would be to use a Pool of processes:

pool = multiprocessing.Pool()
results = pool.starmap(get_sub_matrix_C, ((i, other_args) for i in range(10)))
for i, res in enumerate(results):
 C[i*10:(i+1)*10,:10] = res

I've used starmap since the get_sub_matrix_C function has more than one argument (starmap(f, [(x1, ..., xN)]) calls f(x1, ..., xN)).

Note however that serialization/deserialization may take significant time and space, so you may have to use a more low-level solution to avoid that overhead.

It looks like you are running an outdated version of python. You can replace starmap with plain map but then you have to provide a function that takes a single parameter:

def f(args):
 return get_sub_matrix_C(*args)
pool = multiprocessing.Pool()
results = pool.map(f, ((i, other_args) for i in range(10)))
for i, res in enumerate(results):
 C[i*10:(i+1)*10,:10] = res

Thanks for your answer. Unfortunately I can't test it, since I don't have starmap. Probably I'm using an outdated version of multiprocessing? Version: 0.70a1
@RoSt You can use map and modify the function to accept a single parameter. I've edited the answer to add this solution too.
Thanks for the easy and straightforward solution. It works fine. I would vote you up, but my own reputation is <15, sorry...

CollectivesTM on Stack Overflow

Python multiprocessing and shared numpy array

2 Answers 2

3 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

CollectivesTM on Stack Overflow

2 Answers 2

3 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related