homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author tomasgroth
Recipients tomasgroth
Date 2014年08月22日.08:58:02
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1408697883.93.0.665294573507.issue22248@psf.upfronthosting.co.za>
In-reply-to
Content
Running this simple test script produces the traceback show below.
import urllib.request
page = urllib.request.urlopen('http://legacy.biblegateway.com/versions/?vid=DN1933&action=getVersionInfo#books')
Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
 File "/usr/lib/python3.4/urllib/request.py", line 153, in urlopen
 return opener.open(url, data, timeout)
 File "/usr/lib/python3.4/urllib/request.py", line 461, in open
 response = meth(req, response)
 File "/usr/lib/python3.4/urllib/request.py", line 571, in http_response
 'http', request, response, code, msg, hdrs)
 File "/usr/lib/python3.4/urllib/request.py", line 493, in error
 result = self._call_chain(*args)
 File "/usr/lib/python3.4/urllib/request.py", line 433, in _call_chain
 result = func(*args)
 File "/usr/lib/python3.4/urllib/request.py", line 676, in http_error_302
 return self.parent.open(new, timeout=req.timeout)
 File "/usr/lib/python3.4/urllib/request.py", line 455, in open
 response = self._open(req, data)
 File "/usr/lib/python3.4/urllib/request.py", line 473, in _open
 '_open', req)
 File "/usr/lib/python3.4/urllib/request.py", line 433, in _call_chain
 result = func(*args)
 File "/usr/lib/python3.4/urllib/request.py", line 1258, in http_open
 return self.do_open(http.client.HTTPConnection, req)
 File "/usr/lib/python3.4/urllib/request.py", line 1232, in do_open
 h.request(req.get_method(), req.selector, req.data, headers)
 File "/usr/lib/python3.4/http/client.py", line 1065, in request
 self._send_request(method, url, body, headers)
 File "/usr/lib/python3.4/http/client.py", line 1093, in _send_request
 self.putrequest(method, url, **skips)
 File "/usr/lib/python3.4/http/client.py", line 957, in putrequest
 self._output(request.encode('ascii'))
UnicodeEncodeError: 'ascii' codec can't encode characters in position 31-32: ordinal not in range(128)
Using curl we can see that there is a redirect to an url with a special char:
$ curl -vs "http://legacy.biblegateway.com/versions/?vid=DN1933&action=getVersionInfo#books" >DN1933
* Hostname was NOT found in DNS cache
* Trying 23.23.93.211...
* Connected to legacy.biblegateway.com (23.23.93.211) port 80 (#0)
> GET /versions/?vid=DN1933&action=getVersionInfo HTTP/1.1
> User-Agent: curl/7.35.0
> Host: legacy.biblegateway.com
> Accept: */*
> 
< HTTP/1.1 301 Moved Permanently
* Server nginx/1.4.7 is not blacklisted
< Server: nginx/1.4.7
< Date: 2014年8月22日 08:35:30 GMT
< Content-Type: text/html; charset=UTF-8
< Content-Length: 0
< Connection: keep-alive
< X-Powered-By: PHP/5.5.7
< Set-Cookie: bg_id=1b9a80d5e6d545487cfd153d6df65c4e; path=/; domain=.biblegateway.com
< Set-Cookie: a9gl=0; path=/; domain=.biblegateway.com
< Location: http://legacy.biblegateway.com/versions/Dette-er-Biblen-på-dansk-1933/
< 
* Connection #0 to host legacy.biblegateway.com left intact
When the redirect-url doesn't contain special chars everything works as expected, like with this url: "http://legacy.biblegateway.com/versions/?vid=DNB1930&action=getVersionInfo#books"
History
Date User Action Args
2014年08月22日 08:58:04tomasgrothsetrecipients: + tomasgroth
2014年08月22日 08:58:03tomasgrothsetmessageid: <1408697883.93.0.665294573507.issue22248@psf.upfronthosting.co.za>
2014年08月22日 08:58:03tomasgrothlinkissue22248 messages
2014年08月22日 08:58:02tomasgrothcreate

AltStyle によって変換されたページ (->オリジナル) /