Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Anthorty/Python

Folders and files

NameName
Last commit message
Last commit date

Latest commit

History

2 Commits

Repository files navigation

Pixiv爬虫

  • 请先确认网络能正常打开P站!!!
  • 请先确认网络能正常打开P站!!!
  • 请先确认网络能正常打开P站!!!
需要Python3.6及以上才能正常运行

这是一个单线程的P站爬虫,没加入多线程,所以爬取速度有点慢,个人使用应该没有问题。 爬虫没加入sleep,请不要短时间多次爬取,否则可能暂时ban了ip。

登录后会在此文件目录中生成cookie文件,请不要修改或删除此文件,否则会导致cookie信息失效,也不要发给他人,会使帐号信息泄露。

爬虫使用了requests , BeautifulSoup , lxml,如果不能正常运行,请检查是否安装所需的第三方库。

本爬虫特点:

  1. 支持指定爬取日期
  2. 支持指定爬取榜单
  3. 支持指定爬取页数
  4. 下载后的图片按照文件夹分类,文件名以P站ID命名

不足之处:

  1. 没有加入多线程功能
  2. 没有针对网络连接问题进行优化
  3. 没有本地化存储未爬取链接(减轻服务器负担)

Pixiv Spider

  • Please confirm your network can open pixiv first!!!
  • Please confirm your network can open pixiv first!!!
  • Please confirm your network can open pixiv first!!!
Require Python3.6 and above to run normally

This is a single-threaded spider,so the speed not fast,personal use should be ok. Please don't crawling frequent in short time,it will ban your ip temporary.

When logined in it will have a cookie file,don't modity or delete this file and don't send this file to other people.

Before use please installrequests , BeautifulSoup , lxml.

Feature:

  1. can set crawling date
  2. can set crawling list
  3. can set crawling page
  4. downloaded picture sort by list type,filename is picture id

Shortcoming:

  1. no multiprocess function
  2. no optimization for network connectivity issues
  3. no localised storage for uncrawled links (reducing server load)

About

Learning Python

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

AltStyle によって変換されたページ (->オリジナル) /