2

I have a program that, among other things, parses some big files, and I would like to have this done in parallel to save time. The code flow looks something like this:

if __name__ == '__main__':
 obj = program_object()
 obj.do_so_some_stuff(argv)
 obj.field1 = parse_file_one(f1)
 obj.field2 = parse_file_two(f2)
 obj.do_some_more_stuff()

I tried running the file parsing methods in separate processes like this:

p_1 = multiprocessing.Process(target=parse_file_one, args=(f1))
p_2 = multiprocessing.Process(target=parse_file_two, args=(f2))
p_1.start()
p_2.start()
p_1.join()
p_2.join()

There are 2 problems here. One is how to have the separate process modify the filed, but more importantly, forking the process duplicates my whole main! I get exception regarding argv when executing the

do_so_some_stuff(argv)

second time. That really is not what I wanted. It even happened when I run only 1 of the Processes.

How could I get just the file parsing methods to run in parallel to each other, and then continue back with main process like before?

asked Apr 2, 2014 at 11:26
6
  • Have you not read this docs.python.org/3.1/library/threading.html Commented Apr 2, 2014 at 11:32
  • Or this: docs.python.org/2/library/multiprocessing.html Commented Apr 2, 2014 at 11:39
  • Is it possible to optimize the stuff that needs to happen, besides doing it in separate threads? What is your 'parse_file_one(f1)'? normally multiprocessing should be a "last option" after everything else has been tried before. Commented Apr 2, 2014 at 12:14
  • Those are two xml parsing methods. Each works with different structure of the xml, and then does a lot of different math on the values there. The files are about 100MB each, so this can take significant time when done consecutively. In system I see only 1 core is loaded and 3 are almost idle. I thought if I could run both at same time on separate cores, this should hopefully help. Commented Apr 2, 2014 at 12:24
  • Are you sure there isn't some different problem with the code that fails? The multiprocessing shouldn't run you main twice. Commented Apr 2, 2014 at 14:46

2 Answers 2

1

Try putting the parsing methods in a separate module.

answered Apr 2, 2014 at 11:33
Sign up to request clarification or add additional context in comments.

Comments

1

First, i guess instead of:

obj = program_object()
program_object.do_so_some_stuff(argv)

you mean:

obj = program_object()
obj.do_so_some_stuff(argv)

Second, try using threading like this:

#!/usr/bin/python
import thread
if __name__ == '__main__':
 try:
 thread.start_new_thread( parse_file_one, (f1) )
 thread.start_new_thread( parse_file_two, (f2) )
 except:
 print "Error: unable to start thread"

But, as pointed out by Wooble, depending on the implementation of your parsing functions, this might not be a solution that executes truly in parallel, because of the GIL.

In that case, you should check the Python multiprocessing module that will do true concurrent execution:

multiprocessing is a package that supports spawning processes using an API similar to the threading module. The multiprocessing package offers both local and remote concurrency, effectively side-stepping the Global Interpreter Lock by using subprocesses instead of threads. Due to this, the multiprocessing module allows the programmer to fully leverage multiple processors on a given machine.

answered Apr 2, 2014 at 11:36

3 Comments

Because of the GIL, this is unlikely to make things any faster (or to let them happen in parallel). Of course, this depends what the parse_file_* methods actually do.
@first Yes, that was silly mistake. :) I corrected the post.
I tried your solution. It spawned 2 additional python processes, and they seem to sit there indefinitely. Original code took ~2min. After waiting over 30min for this, I killed them. Anyway, only 1 core was being used. I get similar results when using multiprocessing library directly. Only join with timeout seems to put an end to the processes. I will keep trying and studying the topic, and will post if I find a working solution.

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.