coURLan changelog

Clean, filter and sample URLs to optimize data collection – Python & command-line – Deduplication, spam, content and language filters

All Versions
12
Latest Version
Avg Release Cycle
-
Latest Release
-

Changelog History
Page 1

  • v0.6.0 Changes

    • reviewed code base: simplicity and execution speed
    • ⬇️ dropped support for Python 3.5
  • v0.5.0 Changes

    • more complex language heuristics, use langcodes
    • extended blacklists and whitelists
    • more precise filters and more efficient code
    • πŸ‘Œ support for Python 3.10
  • v0.4.2 Changes

    • ✨ enhanced cleaning
    • πŸ›  fixed language filter
  • v0.4.1 Changes

    • keep trailing slashes to avoid redirection
    • πŸ›  fixes: normalization and crawlable URLs
  • v0.4.0 Changes

    • URL manipulation tools added: extract parts, fix relative URLs
    • filters added: language, navigation and crawls
    • more robust link handling and extraction
    • βœ‚ removed support for Python 3.4
  • v0.3.1 Changes

    • πŸ‘Œ improve filter precision
  • v0.3.0 Changes

    • ⬇️ reduced dependencies: replace requests with bare urllib3, and tldextract with tld for Python 3.6 upwards
    • πŸ‘ better path and fragment normalization
  • v0.2.3 Changes

    • Python 3.9 compatibility
    • Simplified imports
    • πŸ› Bug fixes
  • v0.2.2 Changes

    • English and German language filters
    • Function to detect external links
    • πŸ‘Œ Support for domain blacklisting
  • v0.2.1 Changes

    • Less aggressive strict filters
    • πŸ›  CLI bug fixed
Awesome Python is part of the LibHunt network. Terms. Privacy Policy.

(CC)
BY-SA
We recommend Spin The Wheel Of Names for a cryptographically secure random name picker.

AltStyle γ«γ‚ˆγ£γ¦ε€‰ζ›γ•γ‚ŒγŸγƒšγƒΌγ‚Έ (->γ‚ͺγƒͺγ‚ΈγƒŠγƒ«) /