Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

testing multiprocessing for faster finds! #63

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
vsoch merged 2 commits into master from test/multiprocessing
Mar 27, 2022
Merged

Conversation

@vsoch
Copy link
Collaborator

@vsoch vsoch commented Mar 26, 2022

This might be a terrible idea, but in repos where we have a LOT of files to check, it's getting much slower to do the check. So here I'm going to test using multiprocessing, meaning we can check ~9 files in parallel. I'll try to open a custom action branch so I can test this on a repo I know is rather large (given it passes here of course!).

Signed-off-by: vsoch vsoch@users.noreply.github.com

Signed-off-by: vsoch <vsoch@users.noreply.github.com>
Signed-off-by: vsoch <vsoch@users.noreply.github.com>
Copy link
Collaborator Author

vsoch commented Mar 26, 2022

holy crap @SuperKogito this went from 4-5 minutes to 44 seconds!! Major improvement!! https://github.com/USRSE/usrse.github.io/runs/5701139638?check_suite_focus=true

SuperKogito reacted with heart emoji

@vsoch vsoch requested a review from SuperKogito March 26, 2022 04:44
Copy link
Collaborator Author

vsoch commented Mar 27, 2022

@SuperKogito I know you've been busy or not able to respond in the past - I've tested this (even via the action) on external repos so I'm going to merge for the super speed up, and we can open further issue/PR if any new issues arise! I hope you are doing well!

SuperKogito reacted with thumbs up emoji

@vsoch vsoch merged commit 1f9bd5c into master Mar 27, 2022
@vsoch vsoch deleted the test/multiprocessing branch March 27, 2022 21:37
Copy link
Member

I am really sorry for taking some time, I intended to look at it today but couldn't. Overall the structure is nice (kudos for the new class), and the improvements are great ;) I only wonder if this might touch some requests limit if requests are simultaneous. However since the automatic tests didn't fail, I think it is safe to merge.

Copy link
Collaborator Author

vsoch commented Mar 27, 2022
edited
Loading

Oh yay you are around! I think we should actually be OK because it’s parsed on the level of the file (so checks are unique within a process run), and if we hit a case of shared urls across files it either will work off the bat still or retry. I tested here and on our USRSE repo and saw no issues so I think it’s a huge improvement worth it!

But I thought about this, and if we do need to handle duplication across jobs we could always parse urls first, taking this into account, and then run the multiprocess with no duplications.

That might further optimize actually - ok if O try this out and open another PR? And since I know you are around I will indeed wait for your review this time!

SuperKogito reacted with heart emoji

Copy link
Member

SuperKogito commented Mar 27, 2022
edited
Loading

I agree, I think it is a great improvement 👏 I feel silly how we didn't think of it before 😝
yes filtering the duplicates before multiprocessing should make it even faster - Of course, give it a shot and if it is stable enough let's merge ;)

( just an idea )
I think if you have lengthy checks and looking for further faster processing, it might be worthy to revive #52 sine that will reach the limits of acceleration but asynchronous processing has its drawbacks and I think if we ever pursue that we will need to create different python and action folders (say: urltechie_super_speed :p ) cuz using that will force us to drop a lot of our useful generic features.

Copy link
Collaborator Author

vsoch commented Mar 27, 2022

okay - I'm on it!

Agreed - let me get in a PR for optimizing the urls we check, and then let's rebase #52 and we can time the master branch against the same with async/aiohttp. If it's an improvement that is noticeable, it's definitely worth considering! If it's a trivial change then probably not worth the pain of async 😆

SuperKogito reacted with thumbs up emoji

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

@SuperKogito SuperKogito Awaiting requested review from SuperKogito

Assignees

No one assigned

Labels

None yet

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

AltStyle によって変換されたページ (->オリジナル) /