homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: robotfileparser always uses default Python user-agent
Type: enhancement Stage: resolved
Components: Library (Lib) Versions: Python 3.7
process
Status: closed Resolution: duplicate
Dependencies: Superseder: Lib/robotparser.py doesn't accept setting a user agent string, instead it uses the default.
View: 15851
Assigned To: Nosy List: nagle, xiang.zhang
Priority: normal Keywords:

Created on 2016年11月21日 01:23 by nagle, last changed 2022年04月11日 14:58 by admin. This issue is now closed.

Messages (4)
msg281314 - (view) Author: John Nagle (nagle) Date: 2016年11月21日 01:23
urllib.robotparser.RobotFileParser always uses the default Python user agent. This agent is now blacklisted by many sites, and it's not possible to read the robots.txt file at all.
msg281315 - (view) Author: John Nagle (nagle) Date: 2016年11月21日 01:26
Suggest adding a user_agent optional parameter, as shown here:
 def __init__(self, url='', user_agent=None):
 urllib.robotparser.RobotFileParser.__init__(self, url) # init parent
 self.user_agent = user_agent # save user agent
 
 def read(self):
 """
 Reads the robots.txt URL and feeds it to the parser.
 Overrides parent read function.
 """
 try:
 req = urllib.request.Request( # request with user agent specified
 self.url, 
 data=None)
 if self.user_agent is not None : # if overriding user agent
 req.add_header("User-Agent", self.user_agent)
 f = urllib.request.urlopen(req) # open connection
 except urllib.error.HTTPError as err:
 if err.code in (401, 403):
 self.disallow_all = True
 elif err.code >= 400 and err.code < 500:
 self.allow_all = True
 else:
 raw = f.read()
 self.parse(raw.decode("utf-8").splitlines())
msg281316 - (view) Author: John Nagle (nagle) Date: 2016年11月21日 01:29
(That's from a subclass I wrote. As a change to RobotFileParser, __init__ should start like this.)
 def __init__(self, url='', user_agent=None):
 self.user_agent = user_agent # save user agent
 ...
msg281323 - (view) Author: Xiang Zhang (xiang.zhang) * (Python committer) Date: 2016年11月21日 05:40
Hi, John. This issue of robotparser has been reported in #15851. I'll close this as duplicate and you can discuss in that thread.
History
Date User Action Args
2022年04月11日 14:58:39adminsetgithub: 72942
2016年11月21日 06:12:12ezio.melottisetstage: resolved
2016年11月21日 05:40:42xiang.zhangsetstatus: open -> closed

superseder: Lib/robotparser.py doesn't accept setting a user agent string, instead it uses the default.
versions: - Python 2.7, Python 3.3, Python 3.4, Python 3.5, Python 3.6
nosy: + xiang.zhang

messages: + msg281323
resolution: duplicate
2016年11月21日 01:29:40naglesetmessages: + msg281316
2016年11月21日 01:26:36naglesetmessages: + msg281315
2016年11月21日 01:23:46naglecreate

AltStyle によって変換されたページ (->オリジナル) /