Issue 18044: Email headers do not properly decode to unicode.

➜

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/62244

classification

Title:	Email headers do not properly decode to unicode.
Type:	behavior	Stage:	resolved
Components:	Documentation, email	Versions:	Python 3.3, Python 3.4

process

Dependencies:	Superseder:
Status:	closed	Resolution:	fixed
Assigned To:	docs@python	Nosy List:	Tim.Rawlinson, barry, docs@python, ezio.melotti, python-dev, r.david.murray
Priority:	normal	Keywords:

Created on 2013年05月23日 11:57 by Tim.Rawlinson, last changed 2022年04月11日 14:57 by admin. This issue is now closed.

Messages (4)
msg189862 - (view)	Author: Tim Rawlinson (Tim.Rawlinson)	Date: 2013年05月23日 11:57
In Python 3.3 decoding of headers to unicode is supposed to be automatic but fails in several cases, including one shown as successful in the documentation: >>> msg = message_from_string('Subject: =?utf-8?q?=C3=89ric?=\n\n', policy=default) >>> msg['Subject'] '=?utf-8?q?=C3=89ric?=' >>> msg = message_from_string('To: =?utf-8?q?=C3=89ric <foo@example.com>\n\n', policy=default) >>> msg['To'] '=?utf-8?q?=C3=89ric?= <foo@example.com>' Although the following works: >>> msg = message_from_string('Subject: =?utf-8?q?Eric?=\n\n', policy=default) >>> msg['Subject'] 'Eric' Though this does not: >>> msg = message_from_string('To: =?utf-8?q?Eric?= <foo@example.com>\n\n', policy=default) >>> msg['To'] '=?utf-8?q?Eric?= <foo@example.com>' And just to prove some things are working as they should: >>> msg = message_from_string("Subject: =?gb2312?b?1eLKx9bQzsSy4srUo6E=?=\n\n", policy=default) >>> msg['Subject'] '这是中文测试!'
msg192894 - (view)	Author: Roundup Robot (python-dev) (Python triager)	Date: 2013年07月11日 19:59
New changeset 4acb822f4c43 by R David Murray in branch '3.3': #18044: Fix parsing of encoded words of the form =?utf8?q?=XX...?= http://hg.python.org/cpython/rev/4acb822f4c43 New changeset 32c6cfffbddd by R David Murray in branch 'default': Merge: #18044: Fix parsing of encoded words of the form =?utf8?q?=XX...?= http://hg.python.org/cpython/rev/32c6cfffbddd
msg192895 - (view)	Author: R. David Murray (r.david.murray) * (Python committer)	Date: 2013年07月11日 20:06
This is actually two separate bugs, both a bit embarrassing. The first one (that I just fixed) is that when parsing an encoded word I was only checking for decimal digits after an '=' (instead of the correct hex digits) when trying to do robust detection of encoded word limits. The second (the address one) is due to the fact that somewhere between stopping my full time work on the email project and actually committing the code, I lost track of the fact that I never implemented encoded word parsing for anything other than unstructured headers. The infrastructure is there, I just need to write tests and hook it up. I'm going to open a separate issue for that.
msg192899 - (view)	Author: R. David Murray (r.david.murray) * (Python committer)	Date: 2013年07月11日 20:11
The issue for the second bug is issue 18431.

History
Date	User	Action	Args
2022年04月11日 14:57:45	admin	set	github: 62244
2013年07月11日 20:11:20	r.david.murray	set	messages: + msg192899
2013年07月11日 20:06:03	r.david.murray	set	status: open -> closed versions: + Python 3.4 messages: + msg192895 resolution: fixed stage: resolved
2013年07月11日 19:59:04	python-dev	set	nosy: + python-dev messages: + msg192894
2013年05月26日 13:41:13	ezio.melotti	set	nosy: + ezio.melotti
2013年05月23日 11:57:36	Tim.Rawlinson	create

homepage