Best way to import a large module to use in different modules

Question 1

I have a python module dataProcessor.py which initialises a large amount of data into memory (approximately 3GB) I want to use this module in different processes which are running simultaneously.

But the problem is there is not enough Memory on machine to run everything at same time due to dataProcessor.py loading data into memory for every process (3GB for each process, so for 3 processes a total of 9GB Memory).

I tried using server-client model to initialise data only once and and serve all processes but this model is too slow. Is there any method to load data only once and have other processes access the methods in module dataProcessor.py

The module I am talking about is Spacy which is written in Cython. The data can be any Python object, and won't change once written. It is OK if the solution is a C extensions to Python.

Is there any alternative to server-client or subprocess model which shares memory.

Question 2

Please edit your question to supply more information. Currently it's not clear what you are looking for. (1) What kind of data is this? Does it consist of Python objects, or is it essentially just an array of numbers? (2) Is it readonly once initialized, or will the data change over time? (2a) If one process changes the data, should this affect other processes? (3) Considering your server–client prototype was too slow, what kind of performance do you require? (4) Does the solution have to use pure Python, or would you be comfortable with using C to represent the expensive data?

Question 3

After thinking about this for a while, I don't believe there is any easy solution to share all that state between processes. On Linux, it might be possible to load the module in a parent processes and then fork() it for each actual process. As long as the data is not modified, the data of the parent process will be shared as copy-on-write. Otherwise, your server-client model makes sense, and you could try to investigate why it's too slow and how that could be improved. That's probably the most promising approach here.

Question 4

Is it possible to use multiple threads instead of multiple processes?

Question 5

@EarlCrapstone It is easier to implement this using threads as they share memory spaces, but I want to keep the processes independent.

Question 6

@amon: Assuming the contents of the memory is identical across the processes, you can share it between them with mmap. This is the same effect as your fork idea but may be less work.

Question 7

First, if you can, put the data initialisation into a function (so initialisation won't happen on import). This helps with testing etc.

You can use multiprocessing.sharedctypes to create variables shared across multiple processes, assuming you're forking into multiple process (and not creating multiple threads). You can then pass around these shared variables to forked processes.

Example:

from ctypes import c_double
from multiprocessing.sharedctypes import Array, Pool
arr = Array(c_double, 402653184) # 3 GB array
arr[0] = 1.0
arr[1] = 2.0
...
def fn(in_arr, j):
 print(in_arr[j])
with Pool() as p:
 p.apply_async(fn, (arr, 0))
 p.apply_async(fn, (arr, 1))
 p.apply_async(fn, (arr, 2))
 p.join()

Question 8

Hide the module behind an API and run it as a server. Instantiate the server that implements this API once and make all communications to and from this module go through the API and the server. You can use IPC or directly a rest API, does not matter really.

That said, it's not really thre quick and easy solution to the problem but it should do the job.

Question 9

I have already tried using server and client model but server is the bottle neck so I was hoping to find some other methods to perform same task

Question 10

what do you mean by the server is the bottleneck ? If you have concurrency issues within the library you should address those, any solution where it is instantiated only once will face these issues regardless of how the communication takes place.

Epic Wink Epic Wink 2071 silver badge7 bronze badges · Answer 1 · 2017-09-01 12:00:35Z

First, if you can, put the data initialisation into a function (so initialisation won't happen on import). This helps with testing etc.

You can use multiprocessing.sharedctypes to create variables shared across multiple processes, assuming you're forking into multiple process (and not creating multiple threads). You can then pass around these shared variables to forked processes.

Example:

from ctypes import c_double
from multiprocessing.sharedctypes import Array, Pool
arr = Array(c_double, 402653184) # 3 GB array
arr[0] = 1.0
arr[1] = 2.0
...
def fn(in_arr, j):
 print(in_arr[j])
with Pool() as p:
 p.apply_async(fn, (arr, 0))
 p.apply_async(fn, (arr, 1))
 p.apply_async(fn, (arr, 2))
 p.join()

Newtopian Newtopian 7,2213 gold badges38 silver badges52 bronze badges · Answer 2 · 2017-01-23 19:34:52Z

0

Hide the module behind an API and run it as a server. Instantiate the server that implements this API once and make all communications to and from this module go through the API and the server. You can use IPC or directly a rest API, does not matter really.

That said, it's not really thre quick and easy solution to the problem but it should do the job.

Share

Improve this answer

answered Jan 23, 2017 at 19:34

Newtopian's user avatar

Newtopian Newtopian

7,2213 gold badges38 silver badges52 bronze badges

2

I have already tried using server and client model but server is the bottle neck so I was hoping to find some other methods to perform same task

Harwee
– Harwee

01/23/2017 19:41:35
Commented Jan 23, 2017 at 19:41
1

what do you mean by the server is the bottleneck ? If you have concurrency issues within the library you should address those, any solution where it is instantiated only once will face these issues regardless of how the communication takes place.

Newtopian
– Newtopian

01/23/2017 19:49:55
Commented Jan 23, 2017 at 19:49

Add a comment |

Stack Exchange Network

Best way to import a large module to use in different modules

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Best way to import a large module to use in different modules

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions