homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author karlcow
Recipients dualbus, ezio.melotti, karlcow, orsenthil, terry.reedy
Date 2013年03月06日.04:26:18
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1362543979.28.0.787877448273.issue15851@psf.upfronthosting.co.za>
In-reply-to
Content
Setting a user agent string should be possible.
My guess is that the default library has been used by an abusive client (by mistake or intent) and wikimedia project has decided to blacklist the client based on the user-agent string sniffing.
The match is on anything which matches
"Python-urllib" in UserAgentString
See below:
>>> import urllib.request
>>> opener = urllib.request.build_opener()
>>> opener.addheaders = [('User-agent', 'Python-urllib')]
>>> fobj = opener.open('http://en.wikipedia.org/robots.txt')
Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
 File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/urllib/request.py", line 479, in open
 response = meth(req, response)
 File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/urllib/request.py", line 591, in http_response
 'http', request, response, code, msg, hdrs)
 File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/urllib/request.py", line 517, in error
 return self._call_chain(*args)
 File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/urllib/request.py", line 451, in _call_chain
 result = func(*args)
 File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/urllib/request.py", line 599, in http_error_default
 raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden
>>> import urllib.request
>>> opener = urllib.request.build_opener()
>>> opener.addheaders = [('User-agent', 'Pythonurllib/3.3')]
>>> fobj = opener.open('http://en.wikipedia.org/robots.txt')
>>> fobj
<http.client.HTTPResponse object at 0x101275850>
>>> import urllib.request
>>> opener = urllib.request.build_opener()
>>> opener.addheaders = [('User-agent', 'Pyt-honurllib/3.3')]
>>> fobj = opener.open('http://en.wikipedia.org/robots.txt')
>>> import urllib.request
>>> opener = urllib.request.build_opener()
>>> opener.addheaders = [('User-agent', 'Python-urllib')]
>>> fobj = opener.open('http://en.wikipedia.org/robots.txt')
Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
 File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/urllib/request.py", line 479, in open
 response = meth(req, response)
 File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/urllib/request.py", line 591, in http_response
 'http', request, response, code, msg, hdrs)
 File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/urllib/request.py", line 517, in error
 return self._call_chain(*args)
 File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/urllib/request.py", line 451, in _call_chain
 result = func(*args)
 File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/urllib/request.py", line 599, in http_error_default
 raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden
>>> import urllib.request
>>> opener = urllib.request.build_opener()
>>> opener.addheaders = [('User-agent', 'Python-urlli')]
>>> fobj = opener.open('http://en.wikipedia.org/robots.txt')
>>> 
Being able to change the header might indeed be a good thing.
History
Date User Action Args
2013年03月06日 04:26:19karlcowsetrecipients: + karlcow, terry.reedy, orsenthil, ezio.melotti, dualbus
2013年03月06日 04:26:19karlcowsetmessageid: <1362543979.28.0.787877448273.issue15851@psf.upfronthosting.co.za>
2013年03月06日 04:26:19karlcowlinkissue15851 messages
2013年03月06日 04:26:18karlcowcreate

AltStyle によって変換されたページ (->オリジナル) /