spidy Web Crawler latest version

v1.4.0


spidy Web Crawler v1.4.0 Release Notes

Release Date: 2017年10月04日 // over 8 years ago
  • ⚡️ Much update!

    • 🐧 Confirmed and added support for OS/X and Linux thanks to michellemorales and j-setiawan.
    • 📚 Updated documentation to the current state of things. Still work to be done there.
    • ✂ Removed 'bad file' functionality as it wasn't working as intended and wasn't important anyway. That's what error logs are for.
    • Resolving <base> tags to grab links that wouldn't have been recognized before. Thanks lxml!
    • ➕ Added an optional (on by default) check for file size. Won't download any files larger than 500 MB, assuming the site returns a Content-Length header.
    • ➕ Added Firefox (on Ubuntu) as an option for browser spoofing.

    spidy.zip contains just crawler.py and config/, while the source code archives contain all files.


Previous changes from v1.3

  • 🚀 Final 1.3.0 release. Added error handling back in - no changes needed.

    ⚡️ Optimized all file creation and loading. Everything is now saved with UTF-8 encoding, allowing for foreign characters and EMOJI in pages.

Awesome Python is part of the LibHunt network. Terms. Privacy Policy.

(CC)
BY-SA
We recommend Spin The Wheel Of Names for a cryptographically secure random name picker.

AltStyle によって変換されたページ (->オリジナル) /