Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Fix redirect target verification in AsyncUrlSeeder and enhance tests #1622

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Ahmed-Tawfik94 wants to merge 1 commit into develop
base: develop
Choose a base branch
Loading
from fix-async-url-seeder-redirect-verification

Conversation

@Ahmed-Tawfik94
Copy link
Collaborator

@Ahmed-Tawfik94 Ahmed-Tawfik94 commented Nov 18, 2025

Summary

Fixed a critical bug in AsyncUrlSeeder where _resolve_head() was incorrectly returning redirect targets without verifying they were alive. This could cause dead URLs to be treated as valid during URL discovery.
#1603
Key Changes:

  • Added verify_redirect_targets parameter to AsyncUrlSeeder.__init__() (default: True)
  • Modified _resolve_head() to conditionally verify redirect targets based on the parameter
  • Enhanced documentation with parameter descriptions
  • Added comprehensive test suite for regression testing

Backward Compatibility: Existing code continues to work with improved behavior by default. Users can set verify_redirect_targets=False to restore the previous behavior if needed.

List of files changed and why

  • crawl4ai/async_url_seeder.py - Core bug fix and parameter addition
  • tests/test_async_url_seeder.py - Added unit tests for both verification modes
  • test_scripts/test_async_url_seeder_fixes.py - Comprehensive demo/test suite for all fixes
  • test_scripts/README.md - Documentation for test scripts

How Has This Been Tested?

Created comprehensive test suite covering:

  • ✅ Dead redirect targets are filtered out (default behavior)
  • ✅ Legacy behavior preserved when verify_redirect_targets=False
  • ✅ Valid redirects still work correctly
  • ✅ Direct URL validation unchanged
  • ✅ Context manager functionality preserved
  • ✅ All existing AsyncUrlSeeder functionality works
  • ✅ Integration with SeedingConfig.live_check parameter

Run the test suite with: python test_scripts/test_async_url_seeder_fixes.py

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have added/updated unit tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

- Added `verify_redirect_targets` parameter to control redirect verification.
- Modified `_resolve_head()` to verify redirect targets based on the new parameter.
- Implemented tests for both verification modes, ensuring dead redirects are filtered out and legacy behavior is preserved.
Copy link
Collaborator

ntohidi commented Nov 27, 2025

@Ahmed-Tawfik94 Why didn't you implement the recommended solution you provided in the root cause message here? #1603 (comment)

Copy link
Collaborator Author

@Ahmed-Tawfik94 Why didn't you implement the recommended solution you provided in the root cause message here? #1603 (comment)

According to my root cause analysis, _resolve_head returns redirect targets without confirming that they resolve to 2xx. Setting follow_redirects=True and only returning the final URL if it's 2xx was my suggested solution. it might cause an issue for users who are using the functionality like this and might be a breaking change.
while by usingverify_redirect_targets parameter for single-level verification with an opt-out, making it more conservative and backward-compatible.

Copy link

Jozurf commented Dec 19, 2025

Hi there! I was waiting for this issue to be merged. Is there a timeline when that would happen @Ahmed-Tawfik94 @ntohidi

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

No reviews

Labels

None yet

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

AltStyle によって変換されたページ (->オリジナル) /