I'm trying to figure out a way to share memory between python processes. Basically there is are objects that exists that multiple python processes need to be able to READ (only read) and use (no mutation). Right now this is implemented using redis + strings + cPickle, but cPickle takes up precious CPU time so I'd like to not have to use that. Most of the python shared memory implementations I've seen on the internets seem to require files and pickles which is basically what I'm doing already and exactly what I'm trying to avoid.
What I'm wondering is if there'd be a way to write a like...basically an in-memory python object database/server and a corresponding C module to interface with the database?
Basically the C module would ask the server for an address to write an object to, the server would respond with an address, then the module would write the object, and notify the server that an object with a given key was written to disk at the specified location. Then when any of the processes wanted to retrieve an object with a given key they would just ask the db for the memory location for the given key, the server would respond with the location and the module would know how to load that space in memory and transfer the python object back to the python process.
Is that wholly unreasonable or just really damn hard to implement? Am I chasing after something that's impossible? Any suggestions would be welcome. Thank you internet.
4 Answers 4
From Python 3.8 and onwards you can use multiprocessing.shared_memory.SharedMemory
Comments
Not unreasonable.
IPC can be done with a memory mapped file. Python has functionality built in:
http://docs.python.org/library/mmap.html
Just mmap the file in both processes and hey-presto you have a shared file. Of course you'll have to poll it in both processes to see what changes. And you'll have to co-operate writes between both. And decide what format you want to put your data in. But it's a common solution to your problem.
5 Comments
If you don't want pickling, multiprocessing.sharedctypes might fit. It's a bit low-level, though; you get single values or arrays of specified types.
Another way to distribute data to child processes (one way) is multiprocessing.Pipe. That can handle Python objects, and it's implemented in C, so I cannot tell you wether it uses pickling or not.
Comments
Python do NOT support shared memory between independent processes. You can implement your own in C language, or use SharedArray
if you are working with libsvm, numpy.ndarray, scipy.sparse.
pip install SharedArray
def test ():
def generateArray ():
print('generating')
from time import sleep
sleep(3)
return np.ones(1000)
a = Sarr('test/1', generateArray)
# use same memory as a, also work in a new process
b = Sarr('test/1', generateArray)
c = Sarr('test/1', generateArray)
import re
import SharedArray
import numpy as np
class Sarr (np.ndarray):
def __new__ (self, name, getData):
if not callable(getData) and getData is None:
return None
self.orig_name = name
shm_name = 'shm://' + re.sub(r'[./]', '_', name)
try:
shm = SharedArray.attach(shm_name)
print('[done] reuse shared memory:', name)
return shm
except Exception as err:
self._unlink(shm_name)
data = getData() if callable(getData) else getData
shm = SharedArray.create(shm_name, data.size)
shm[:] = data[:]
print('[done] loaded data to shared memory:', name)
return shm
def _unlink (name):
try:
SharedArray.delete(name[len('shm://'):])
print('deleted shared memory:', name)
except:
pass
if __name__ == '__main__':
test()
multiprocessingmodule. (The first sentence of that paragraphs recommends against doing so of course.)