Message255440
| Author |
martin.panter |
| Recipients |
barry, demian.brecht, ezio.melotti, gregory.p.smith, martin.panter, r.david.murray, scharron, serhiy.storchaka |
| Date |
2015年11月26日.23:19:02 |
| SpamBayes Score |
-1.0 |
| Marked as misclassified |
Yes |
| Message-id |
<1448579943.5.0.317912844783.issue22233@psf.upfronthosting.co.za> |
| In-reply-to |
| Content |
For the record, this is a simplified version of the original scenario, showing the low-level HTTP protocol:
>>> request = (
... b"GET /%C4%85 HTTP/1.1\r\n"
... b"Host: graph.facebook.com\r\n"
... b"\r\n"
... )
>>> s = create_connection(("graph.facebook.com", HTTPS_PORT))
>>> with ssl.wrap_socket(s) as s:
... s.sendall(request)
... response = s.recv(3000)
...
50
>>> pprint(response.splitlines(keepends=True))
[b'HTTP/1.1 404 Not Found\r\n',
b'WWW-Authenticate: OAuth "Facebook Platform" "not_found" "(#803) Some of the '
b'aliases you requested do not exist: \xc4\x85"\r\n',
b'Access-Control-Allow-Origin: *\r\n',
b'Content-Type: text/javascript; charset=UTF-8\r\n',
b'X-FB-Trace-ID: H9yxnVcQFuA\r\n',
b'X-FB-Rev: 2063232\r\n',
b'Pragma: no-cache\r\n',
b'Cache-Control: no-store\r\n',
b'Facebook-API-Version: v2.0\r\n',
b'Expires: 2000年1月01日 00:00:00 GMT\r\n',
b'X-FB-Debug: 07ouxMl1Z439Ke/YzHSjXx3o9PcpGeZBFS7yrGwTzaaudrZWe5Ef8Z96oSo2dINp'
b'3GR4q78+1oHDX2pUF2ky1A==\r\n',
b'Date: 2015年11月26日 23:03:47 GMT\r\n',
b'Connection: keep-alive\r\n',
b'Content-Length: 147\r\n',
b'\r\n',
b'{"error":{"message":"(#803) Some of the aliases you requested do not exist: '
b'\\u0105","type":"OAuthException","code":803,"fbtrace_id":"H9yxnVcQFuA"}}']
In my mind, the simplest way forward would be to change the "email" module to only parse lines using the "universal newlines" algorithm. The /Lib/email/feedparser.py module should use StringIO(s, newline="").readlines() rather than s.splitlines(keepends=True). That would mean all email parsing behaviour would change; for instance, given the following message:
>>> m = email.message_from_string(
... "WWW-Authenticate: abc\x85<body or header?>\r\n"
... "\r\n"
... )
instead of the current behaviour:
>>> m.items()
[('WWW-Authenticate', 'abc\x85')]
>>> m.get_payload()
'<body or header?>\r\n\r\n'
it would change to:
>>> m.items()
[('WWW-Authenticate', 'abc\x85<body or header?>')]
>>> m.get_payload()
'' |
|