-
-
Notifications
You must be signed in to change notification settings - Fork 5.9k
Fix redirect target verification in AsyncUrlSeeder and enhance tests #1622
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Added `verify_redirect_targets` parameter to control redirect verification. - Modified `_resolve_head()` to verify redirect targets based on the new parameter. - Implemented tests for both verification modes, ensuring dead redirects are filtered out and legacy behavior is preserved.
ntohidi
commented
Nov 27, 2025
@Ahmed-Tawfik94 Why didn't you implement the recommended solution you provided in the root cause message here? #1603 (comment)
Ahmed-Tawfik94
commented
Nov 28, 2025
@Ahmed-Tawfik94 Why didn't you implement the recommended solution you provided in the root cause message here? #1603 (comment)
According to my root cause analysis, _resolve_head returns redirect targets without confirming that they resolve to 2xx. Setting follow_redirects=True and only returning the final URL if it's 2xx was my suggested solution. it might cause an issue for users who are using the functionality like this and might be a breaking change.
while by usingverify_redirect_targets parameter for single-level verification with an opt-out, making it more conservative and backward-compatible.
Jozurf
commented
Dec 19, 2025
Hi there! I was waiting for this issue to be merged. Is there a timeline when that would happen @Ahmed-Tawfik94 @ntohidi
Summary
Fixed a critical bug in AsyncUrlSeeder where
_resolve_head()was incorrectly returning redirect targets without verifying they were alive. This could cause dead URLs to be treated as valid during URL discovery.#1603
Key Changes:
verify_redirect_targetsparameter toAsyncUrlSeeder.__init__()(default:True)_resolve_head()to conditionally verify redirect targets based on the parameterBackward Compatibility: Existing code continues to work with improved behavior by default. Users can set
verify_redirect_targets=Falseto restore the previous behavior if needed.List of files changed and why
crawl4ai/async_url_seeder.py- Core bug fix and parameter additiontests/test_async_url_seeder.py- Added unit tests for both verification modestest_scripts/test_async_url_seeder_fixes.py- Comprehensive demo/test suite for all fixestest_scripts/README.md- Documentation for test scriptsHow Has This Been Tested?
Created comprehensive test suite covering:
verify_redirect_targets=FalseRun the test suite with:
python test_scripts/test_async_url_seeder_fixes.pyChecklist: