-
Notifications
You must be signed in to change notification settings - Fork 10
feat: implement retry mechanism for dataset downloads to handle rate limits #159
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@samiuc
samiuc
commented
Oct 6, 2025
- Updated to use the versioned Omnidoc Bench dataset to fix test failures caused by format changes in the original dataset. (https://huggingface.co/datasets/opendatalab/OmniDocBench/tree/main)
- Added exponential backoff to handle Hugging Face rate limiting errors (1000 requests per 5 minutes).
...limits Signed-off-by: samiuc <sami.ullah.chat@gmail.com>
✅ DCO Check Passed
Thanks @samiuc, all your commits are properly signed off. 🎉
Merge Protections
Your pull request matches the following merge protections and will not be merged until they are valid.
🟢 Enforce conventional commit
Wonderful, this rule succeeded.
Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/
-
title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert)(?:\(.+\))?(!)?:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@samiuc This looks nice!Just one question: Do we need the max_workers
thing?
Signed-off-by: samiuc <sami.ullah.chat@gmail.com>
Signed-off-by: samiuc <sami.ullah.chat@gmail.com>
This looks nice!Just one question: Do we need the
max_workers
thing?
Thanks! The max_workers
parameter isn’t strictly required, it’s optional. It just helps control concurrency when multiple tests are running in parallel
Signed-off-by: samiuc <sami.ullah.chat@gmail.com>