This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2016年11月21日 01:23 by nagle, last changed 2022年04月11日 14:58 by admin. This issue is now closed.
| Messages (4) | |||
|---|---|---|---|
| msg281314 - (view) | Author: John Nagle (nagle) | Date: 2016年11月21日 01:23 | |
urllib.robotparser.RobotFileParser always uses the default Python user agent. This agent is now blacklisted by many sites, and it's not possible to read the robots.txt file at all. |
|||
| msg281315 - (view) | Author: John Nagle (nagle) | Date: 2016年11月21日 01:26 | |
Suggest adding a user_agent optional parameter, as shown here:
def __init__(self, url='', user_agent=None):
urllib.robotparser.RobotFileParser.__init__(self, url) # init parent
self.user_agent = user_agent # save user agent
def read(self):
"""
Reads the robots.txt URL and feeds it to the parser.
Overrides parent read function.
"""
try:
req = urllib.request.Request( # request with user agent specified
self.url,
data=None)
if self.user_agent is not None : # if overriding user agent
req.add_header("User-Agent", self.user_agent)
f = urllib.request.urlopen(req) # open connection
except urllib.error.HTTPError as err:
if err.code in (401, 403):
self.disallow_all = True
elif err.code >= 400 and err.code < 500:
self.allow_all = True
else:
raw = f.read()
self.parse(raw.decode("utf-8").splitlines())
|
|||
| msg281316 - (view) | Author: John Nagle (nagle) | Date: 2016年11月21日 01:29 | |
(That's from a subclass I wrote. As a change to RobotFileParser, __init__ should start like this.) def __init__(self, url='', user_agent=None): self.user_agent = user_agent # save user agent ... |
|||
| msg281323 - (view) | Author: Xiang Zhang (xiang.zhang) * (Python committer) | Date: 2016年11月21日 05:40 | |
Hi, John. This issue of robotparser has been reported in #15851. I'll close this as duplicate and you can discuss in that thread. |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:58:39 | admin | set | github: 72942 |
| 2016年11月21日 06:12:12 | ezio.melotti | set | stage: resolved |
| 2016年11月21日 05:40:42 | xiang.zhang | set | status: open -> closed superseder: Lib/robotparser.py doesn't accept setting a user agent string, instead it uses the default. versions: - Python 2.7, Python 3.3, Python 3.4, Python 3.5, Python 3.6 nosy: + xiang.zhang messages: + msg281323 resolution: duplicate |
| 2016年11月21日 01:29:40 | nagle | set | messages: + msg281316 |
| 2016年11月21日 01:26:36 | nagle | set | messages: + msg281315 |
| 2016年11月21日 01:23:46 | nagle | create | |