Shared memory between python processes

Question 1

I'm trying to figure out a way to share memory between python processes. Basically there is are objects that exists that multiple python processes need to be able to READ (only read) and use (no mutation). Right now this is implemented using redis + strings + cPickle, but cPickle takes up precious CPU time so I'd like to not have to use that. Most of the python shared memory implementations I've seen on the internets seem to require files and pickles which is basically what I'm doing already and exactly what I'm trying to avoid.

What I'm wondering is if there'd be a way to write a like...basically an in-memory python object database/server and a corresponding C module to interface with the database?

Basically the C module would ask the server for an address to write an object to, the server would respond with an address, then the module would write the object, and notify the server that an object with a given key was written to disk at the specified location. Then when any of the processes wanted to retrieve an object with a given key they would just ask the db for the memory location for the given key, the server would respond with the location and the module would know how to load that space in memory and transfer the python object back to the python process.

Is that wholly unreasonable or just really damn hard to implement? Am I chasing after something that's impossible? Any suggestions would be welcome. Thank you internet.

Question 2

Exactly how precious is your CPU time that it's worth dumping a working solution that's much less fiddly to keep synchronised than what you're proposing? What you're asking for can be done but it will be a huge pain in the ass to do correctly.

Question 3

CPU time is the most precious. Basically unpickling objects can take anywhere from 20 ms (for a small one) to 60 ms (for a big one). I personally feel that both of these times are too long. EDIT: Too long in the sense that there has to be a better way, not that i think cPickle isn't trying hard enough.

Question 4

Sharing memory would be doable, but sharing objects will be seriously hard... A related question can be found here: stackoverflow.com/questions/1268252/… (there's a nice writeup of Alex Martelli explaining why this is hard).

Question 5

@f34r Is pickling known to be the primary or at least a significant bottleneck in your current codebase? If not, your CPU time is, in fact, in ample supply. You might be able to make a case for dumping Redis if your data isn't really persistent and replacing it with just sending pickled values directly. But my gut feeling is that you can't have a "share-nothing" distributed architecture without serialised messages, and they're much easier to reason about than shared memory systems.

Question 6

I suppose you could also look into the sharing mechanisms of the multiprocessing module. (The first sentence of that paragraphs recommends against doing so of course.)

Question 7

From Python 3.8 and onwards you can use multiprocessing.shared_memory.SharedMemory

Question 8

Not unreasonable.

IPC can be done with a memory mapped file. Python has functionality built in:

http://docs.python.org/library/mmap.html

Just mmap the file in both processes and hey-presto you have a shared file. Of course you'll have to poll it in both processes to see what changes. And you'll have to co-operate writes between both. And decide what format you want to put your data in. But it's a common solution to your problem.

Question 9

But this would still require serializing to bytes, yes? The OP said he was trying to avoid that.

Question 10

This would some kind of serialisation, yes. Perhaps custom serialisation would do a better job if the object type is known. Alternatively include a hash code to avoid re-deserialising an object twice. However this is done, serialisation is required.

Question 11

Are you sure? Network + redis would be comparatively expensive. Why not profile it?

Question 12

Redis is running on the machine, it's not a network redis server

Question 13

Still, you're either doing it over a domain socket or a local TCP socket, right?

Question 14

If you don't want pickling, multiprocessing.sharedctypes might fit. It's a bit low-level, though; you get single values or arrays of specified types.

Another way to distribute data to child processes (one way) is multiprocessing.Pipe. That can handle Python objects, and it's implemented in C, so I cannot tell you wether it uses pickling or not.

Question 15

Python do NOT support shared memory between independent processes. You can implement your own in C language, or use SharedArray if you are working with libsvm, numpy.ndarray, scipy.sparse.

pip install SharedArray

def test ():
 def generateArray ():
 print('generating')
 from time import sleep
 sleep(3)
 return np.ones(1000)
 a = Sarr('test/1', generateArray)
 # use same memory as a, also work in a new process
 b = Sarr('test/1', generateArray) 
 c = Sarr('test/1', generateArray)
import re
import SharedArray
import numpy as np
class Sarr (np.ndarray):
 def __new__ (self, name, getData):
 if not callable(getData) and getData is None:
 return None
 self.orig_name = name
 shm_name = 'shm://' + re.sub(r'[./]', '_', name)
 try:
 shm = SharedArray.attach(shm_name)
 print('[done] reuse shared memory:', name)
 return shm
 except Exception as err:
 self._unlink(shm_name)
 data = getData() if callable(getData) else getData
 shm = SharedArray.create(shm_name, data.size)
 shm[:] = data[:]
 print('[done] loaded data to shared memory:', name)
 return shm
 def _unlink (name):
 try:
 SharedArray.delete(name[len('shm://'):])
 print('deleted shared memory:', name)
 except:
 pass
if __name__ == '__main__':
 test()

Vinay Sharma 1841 silver badge7 bronze badges · Accepted Answer · 2020-07-26 06:32:16Z

12

From Python 3.8 and onwards you can use multiprocessing.shared_memory.SharedMemory

Share

Improve this answer

answered Jul 26, 2020 at 6:32

Vinay Sharma's user avatar

Vinay Sharma

1841 silver badge7 bronze badges

Sign up to request clarification or add additional context in comments.

CollectivesTM on Stack Overflow

Shared memory between python processes

4 Answers 4

Comments

5 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

CollectivesTM on Stack Overflow

4 Answers 4

Comments

5 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related