4

I am working on a program that uses an external C library to parse data from external sources and a Python library to run some optimisation problem on it. The optimisation is very time consuming so using several CPU would be a significant plus.

Basically, I wrapped the C(++) structures with Cython as follows:

cdef class CObject(object):
 cdef long p_sthg
 cdef OBJECT* sthg
 def __cinit__(self, sthg):
 self.p_sthg = sthg
 self.sthg = <OBJECT*> self.p_sthg
 def __reduce__(self):
 return (rebuildObject, (self.p_sthg, ))
 def getManyThings(self):
 ...
 return blahblahblah

Then I create my resource intensive process:

p = mp.Process(target=make_process, args=((cobject,)))

As you can immediately guess (of course I didn't), even though I manage to unpickle the CObject, the pointer is passed to the new process, but not the C structure it refers to.

I can find some resources explaining how to put Python objects into shared memory, but that would not be sufficient in my case, as I would need to share C objects I barely know about (and other objects that are pointed at by the top CObject) between the Python processes.

In case it matters, the good thing is that I can survive with a read-only access...

Does anyone have any experience in such matter?
My other idea would be to find a way to write the binary representation of the object I need to pass into file and read it from the other process...

asked Aug 24, 2015 at 15:11

1 Answer 1

4

There's no single, general way to do this.

You can put the C object into shared memory by constructing it inside a suitable mmap(2) region (also available via mmap in the Python standard library; use MAP_SHARED|MAP_ANONYMOUS). This requires the entire object to lie within the mmap, and will likely make it impossible for the object to use pointers (but offsets relative to the object are probably OK provided they point within the mmap). If the object has any file descriptors or other handles of any kind, those will almost certainly not work correctly. Note that mmap() is like malloc(); you have to do a corresponding munmap() or you leak the memory.

You could copy the C object into shared memory (with e.g. memcpy(3)). This is likely less efficient, and requires the object to be reasonably copiable. memcpy does not magically fix up pointers and other references. On the plus side, this does not require you to control the object's construction.

You can serialize the object to some binary representation and pass it through a pipe(2) (also available via os.pipe() in Python). For simple cases, this is a field-by-field copy, but again, pointers will need care. You will have to (un)swizzle your pointers to make them work correctly after (de)serialization. This is the most easily generalized technique, but requires knowledge of how the object is structured, or a black-box function that does the serialization for you.

Finally, you can create temporary files in /dev/shm and exchange information that way. These files are backed by RAM and are effectively the same as shared memory, but with perhaps a more familiar file-like interface. But this is Unix-only. On systems other than Linux, you should use shm_open(3) for full portability.

Note that shared memory, in general, tends to be problematic. It requires inter-process synchronization, but the necessary locking primitives are far less developed than in the threading world. I recommend limiting shared memory to immutable objects or inherently lock-free designs (which are quite difficult to get right).

answered Aug 24, 2015 at 15:42
Sign up to request clarification or add additional context in comments.

2 Comments

My structure is not that complex, but links to std::list of other structures, being themselves linked to std::list of other structures (I didn't check the level of recursion, but there are no cycle...) I guess Cython has not been designed to trick the allocators (found things with boost, but nothing more basic?) and manually adjust remaining pointers... I may have a workaround specific to my needs (by calling the data fetching from the process and pruning what I need, much safer!) but woud still love to see a test case of a hidden std::list<simple> passed through shared memory in Cython...
@xoolive: A list is made up of many pointers, most encapsulated. Those pointers will not work if you try to share it, and the whole thing will blow up. Apparently Boost can do this, but that's a whole extra dependency and looks like it doesn't directly work on STL containers. It might be simpler to recursively walk your list and serialize its elements one at a time.

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.