We have one program in Java and one in Python, and need to get them taking together in a ping-pong manner, each time exchanging an integer array of length 100,000, and taking ~ 0.1 - 1 second to do their work:
- Java does some work and fires an int array of length 100,000 over to ...
- Python, which does some work and fires a new array of length 100,000 back to ...
- Java, which does some work ... etc
Note that
- Each program needs to wait for the other to do it's part.
- They will run on the same Linux machine.
- We will do Monte Carlo simulation, so speed is important.
I am more familiar with Java, and understand that a shared memory backed file approach is likely to be the fastest. This seems relevant for the Java side, but how would I get each program to wait/block for the other to complete its work and update the shared memory before the other starts reading? I've heard of something called 'semaphore', but can't figure it out.
These are my fallback ideas, but perhaps they are better?
- Unix Domain Sockets with jnr-unixsocket
- Sockets with Speedus
-
4Maybe a stupid suggestion, but if they run on the same machine, couldn't you execute the Python program via Jython (which should simplify the communication)?UnholySheep– UnholySheep2017年04月29日 16:02:57 +00:00Commented Apr 29, 2017 at 16:02
-
1Well you can just iterate from 0 to 100,000 and read an integer from a socketJacob G.– Jacob G.2017年04月29日 16:21:12 +00:00Commented Apr 29, 2017 at 16:21
-
3I hope you see how much complexity that switch between Java and python creates. I would vote to either use jython or to drop one side and do all computations either with java or python.GhostCat– GhostCat2017年04月29日 16:34:35 +00:00Commented Apr 29, 2017 at 16:34
-
1Yes, interprocess communication is relatively simple but gets more complex when you're required to use more than one languageJacob G.– Jacob G.2017年04月29日 16:47:17 +00:00Commented Apr 29, 2017 at 16:47
-
1If you are doing Monte Carlo, your bottleneck should be the amount of CPU you are using. Using loopback should be more than fast enough. You can pass 400KB over loopback in a few milli-seconds.Peter Lawrey– Peter Lawrey2017年04月29日 18:10:30 +00:00Commented Apr 29, 2017 at 18:10
2 Answers 2
You could try combining java and python in the same process using jep. The latest release added support for sharing memory between python and java using numpy ndarrays and java direct buffers. This would let you share the data without any copying which should give the best performance possible.
Comments
Go for a fast intermediary data server to assist in communication between them. Redis would do the trick. You'll need two data structures there:
- a list (your list of 100,000 items). We'll call that
my_project:listfor reference. - a lock. This can just be a Redis string set to "Python" or "Java."
Then have the following interaction:
- Both Python and Java poll the Redis lock. If it's equal to "Python", it's Python's turn. If "Java," it's Java's turn.
- Whichever program's turn it is goes into work mode and does whatever it needs to
my_project:list, then it sets the lock to the other program's turn. - Repeat indefinitely.