Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Consider using Pathos #3

Alex-ley started this conversation in Ideas
Oct 16, 2021 · 1 comments · 1 reply
Discussion options

You mentioned that multiprocessing uses Pickle and that has some limitations. Have you heard of Pathos (https://github.com/uqfoundation/pathos, which uses dill for serializing python objects)? Dill allows serializing of inner scope functions with closures etc. as well as lambda functions and class methods etc.

It could be a cool option to re-write this using that?

You must be logged in to vote

Replies: 1 comment 1 reply

Comment options

@Alex-ley Thanks for your input! I actually have not heard about pathos but I will gladly look into it. Alternatively, it should also be possible to use a custom serializer. However, my biggest issue so far was the ultra-slow allocation of the process pool. So I was thinking about implementing ray. ray also uses a faster serialization and would solve the process allocation but doesn't work well with jupyter notebooks (which I use a lot).
Anyway, if you want to help implementing this feel free to create a PR.

You must be logged in to vote
1 reply
Comment options

Oh nice, yeah I haven't had much time to play around with Ray or Dask or Modin or Vaex or any of the other parallel processing frameworks. I've always found the standard multiprocessing module usually does the trick when working on an AWS EC2 instance with a large number of cores. There definitely is a sweet spot though - it doesn't pay to use all 24 CPU threads if the data size is not big enough etc. but it really pay off with millions of rows of data and a CPU intensive task per row etc.

The only reason I discovered pathos is that recently I was trying to use an inner function scope with a closure for some task (as I needed the closure to be created on the fly from the outer function's params) and I realized for the first time that the Pickle version that the standard multiprocessing library uses only supports global functions. I was a bit surprised to be honest. So, I headed to StackOverflow and the pathos author was quite active in answering questions and offering his library as a solution. Since it was the fastest way to solve my problem (and easier than re-writing with global functions with partials etc. - which also seemed very slow for some reason) I just downloaded it and basically only needed to change the import statement and my inner function worked.

Therefore, I can probably do a PR here on this repo to use pathos in hopefully just a few lines of code. I'll do that and you can see what you think.

Interesting that Ray talks about Pickle version 3/4 vs 5 (https://docs.ray.io/en/latest/serialization.html) - I also read something about the standard multiprocessing module using a really old / CPython version of Pickle and hence the global function limitations etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Ideas
Labels
None yet

AltStyle によって変換されたページ (->オリジナル) /