Message169718
| Author |
dualbus |
| Recipients |
dualbus |
| Date |
2012年09月02日.18:36:03 |
| SpamBayes Score |
-1.0 |
| Marked as misclassified |
Yes |
| Message-id |
<1346610964.7.0.836759738208.issue15851@psf.upfronthosting.co.za> |
| In-reply-to |
| Content |
I found that http://en.wikipedia.org/robots.txt returns 403 if the provided user agent is in a specific blacklist.
And since robotparser doesn't provide a mechanism to change the default user agent used by the opener, it becomes unusable for that site (and sites that have a similar policy).
I think the user should have the possibility to set a specific user agent string, to better identify their bot.
I attach a patch that allows the user to change the opener used by RobotFileParser, in case the need of some specific behavior arises.
I also attach a simple example of how it solves the issue, at least with wikipedia. |
|
History
|
|---|
| Date |
User |
Action |
Args |
| 2012年09月02日 18:36:04 | dualbus | set | recipients:
+ dualbus |
| 2012年09月02日 18:36:04 | dualbus | set | messageid: <1346610964.7.0.836759738208.issue15851@psf.upfronthosting.co.za> |
| 2012年09月02日 18:36:04 | dualbus | link | issue15851 messages |
| 2012年09月02日 18:36:03 | dualbus | create |
|