Python: Question about multiprocessing / multithreading and shared resources

Question 1

Here's the simplest multi threading example I found so far:

import multiprocessing
import subprocess
def calculate(value):
 return value * 10
if __name__ == '__main__':
 pool = multiprocessing.Pool(None)
 tasks = range(10000)
 results = []
 r = pool.map_async(calculate, tasks, callback=results.append)
 r.wait() # Wait on the results
 print results

I have two lists and one index to access the elements in each list. The ith position on the first list is related to the ith position on the second. I didn't use a dict because the lists are ordered.

What I was doing was something like:

for i in xrange(len(first_list)):
 # do something with first_list[i] and second_list[i]

So, using that example, I think can make a function sort of like this:

#global variables first_list, second_list, i
first_list, second_list, i = None, None, 0
#initialize the lists
...
#have a function to do what the loop did and inside it increment i
def function:
 #do stuff
 i += 1

But, that makes i a shared resource and I'm not sure if that'd be safe. It also seems to me my design is not lending itself well to this multithreaded approach, but I'm not sure how to fix it.

Here's a working example of what I wanted (Edit an image you want to use):

import multiprocessing
import subprocess, shlex
links = ['http://www.example.com/image.jpg']*10 # don't use this URL
names = [str(i) + '.jpg' for i in range(10)]
def download(i):
 command = 'wget -O ' + names[i] + ' ' + links[i]
 print command
 args = shlex.split(command)
 return subprocess.call(args, shell=False)
if __name__ == '__main__':
 pool = multiprocessing.Pool(None)
 tasks = range(10)
 r = pool.map_async(download, tasks)
 r.wait() # Wait on the results

Question 2

you don't need shlex.split() just do: args = ['wget', '-O', names[i], links[i]].

Question 3

First off, it might be beneficial to make one list of tuples, for example

new_list[i] = (first_list[i], second_list[i])

That way, as you change i, you ensure that you are always operating on the same items from first_list and second_list.

Secondly, assuming there are no relations between the i and i-1 entries in your lists, you can use your function to operate on one given i value, and spawn a thread to handle each i value. Consider

indices = range(len(new_list))
results = []
r = pool.map_async(your_function, indices, callback=results.append)
r.wait() # Wait on the results

This should give you what you want.

Question 4

I tried doing it, but it seems another complication prevents it from working. Inside the function I call subprocess.call, but then each thread ends up waiting for it to complete and it means the multiprocess approach doesn't work. If I use subprocess.Popen, it just spawns too many processes. Is there a way around that?

Question 5

subprocess.call and subprocess.Popen should function the same. Can you edit your question with the exact code?

Question 6

subprocess.call waits for the process to finish and subprocess.popen just spawns multiple processes. At least that's been by experience with wget. docs.python.org/library/subprocess.html I guess they're the same if you do a Popen and wait() though.

Question 7

I made a mistake elsewhere, nevermind. Everything works as expected. I'll add an example in the OP.

brc 5,4092 gold badges32 silver badges30 bronze badges · Accepted Answer · 2011-09-19 04:15:19Z

1

First off, it might be beneficial to make one list of tuples, for example

new_list[i] = (first_list[i], second_list[i])

That way, as you change i, you ensure that you are always operating on the same items from first_list and second_list.

Secondly, assuming there are no relations between the i and i-1 entries in your lists, you can use your function to operate on one given i value, and spawn a thread to handle each i value. Consider

indices = range(len(new_list))
results = []
r = pool.map_async(your_function, indices, callback=results.append)
r.wait() # Wait on the results

This should give you what you want.

Share

Improve this answer

answered Sep 19, 2011 at 4:15

brc's user avatar

brc

5,4092 gold badges32 silver badges30 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

multi

multi Over a year ago

I tried doing it, but it seems another complication prevents it from working. Inside the function I call subprocess.call, but then each thread ends up waiting for it to complete and it means the multiprocess approach doesn't work. If I use subprocess.Popen, it just spawns too many processes. Is there a way around that?

2011年09月19日T04:28:10.97Z+00:00

brc

brc Over a year ago

subprocess.call and subprocess.Popen should function the same. Can you edit your question with the exact code?

2011年09月19日T04:46:40.423Z+00:00

multi

multi Over a year ago

subprocess.call waits for the process to finish and subprocess.popen just spawns multiple processes. At least that's been by experience with wget. docs.python.org/library/subprocess.html I guess they're the same if you do a Popen and wait() though.

2011年09月19日T04:50:07.163Z+00:00

multi

multi Over a year ago

I made a mistake elsewhere, nevermind. Everything works as expected. I'll add an example in the OP.

2011年09月19日T05:04:17.487Z+00:00

CollectivesTM on Stack Overflow

Python: Question about multiprocessing / multithreading and shared resources

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

CollectivesTM on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related