Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

mmoonzhu/spider_python

Folders and files

NameName
Last commit message
Last commit date

Latest commit

History

23 Commits

Repository files navigation

spider_python

抓取北邮人论坛和水木社区校招信息的爬虫程序, 直接运行main.py即可,非常简洁,可以扩展

程序依赖以下第三方Python包:requests, BeautifulSoup, redis-py

爬虫根据自定义关键字先对校招信息进行过滤,然后存储到本机redis中。本机若有lamp环境,可直接从redis读取信息到web页面上即可,lamp环境中的php程序示例如下:

<!DOCTYPE html>
<html>
<head>
<title>Welcome to spider!</title>
<style>
 body {
 width: 35em;
 margin: 0 auto;
 }
 a:visited { color: red; }
</style>
</head>
<body>
<?php
$rs_ip = '127.0.0.1';
$rs_port = 6379;
$rs = new Redis();
$rs->connect($rs_ip, $rs_port);
$ret = $rs->smembers('urls');
foreach($ret as $herf) {
 echo $herf . "<br/>";
}
?>
</body>
</html>

效果截图:

1

此外,可以使用crontab或launchAgent(Mac OS X)把爬虫设定成定时任务,我的launchAgent如下:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/ DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
 <key>Label</key>
 <string>com.lzrak47.spider.plist</string>
 <key>ProgramArguments</key>
 <array>
 <string>/usr/local/bin/python</string>
 <string>/Users/lzrak47/project/python/spider_python/main.py</string>
 </array>
 <key>RunAtLoad</key>
 <true/>
 <key>UserName</key>
 <string>lzrak47</string>
 <key>StartInterval</key>
 <integer>3600</integer>
</dict>
</plist>

Enjoy it。

About

爬虫,爬虫。

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%

AltStyle によって変換されたページ (->オリジナル) /