Is it possible to deal in python with shared memory parallel tasks? My task should be parallel over several cores (though threading module does not fit here, as far as I know the only facilities to do that is multiprocessing). There are lots of tasks for which I create a thread (in a python case process) pool. Then I need to initialize this threads (processes) with a lot of data from the main thread (process). Threads process this data and return new ones (again a lot of) to the main thread (process). What I see a huge overhead is that each task must copy the data to the new process in case of processes and do that same after finishing. But in case of threads this is eliminated. And it should be a huge speed up. Can I achieve this speed up with python?
2 Answers 2
Yes, the multiprocessing module does support placing objects in shared memory. See the documentation.
7 Comments
Value and Array work as advertised on Windows (i.e. using true shared memory).Threads won't help you due to the GIL serializing them. However, you can still use multiprocessing and share the data between processes using the mmap module or equivalent. This will require the data to be structured to be readable from the file, so you won't be able to use, say, dictionaries - but using a file-based storage such as sqlite will be fine.