Issue 10574: email.header.decode_header fails if the string contains multiple directives

➜

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/54783

classification

Title:	email.header.decode_header fails if the string contains multiple directives
Type:	behavior	Stage:	resolved
Components:	email, Library (Lib)	Versions:	Python 3.2, Python 3.3, Python 2.7

process

Status:	closed	Resolution:	duplicate
Dependencies:	Superseder:	decode_header does not follow RFC 2047 View: 1079
Assigned To:	Nosy List:	barry, invisibleroads, r.david.murray
Priority:	normal	Keywords:

Created on 2010年11月29日 04:58 by invisibleroads, last changed 2022年04月11日 14:57 by admin. This issue is now closed.

Messages (8)
msg122772 - (view)	Author: Roy Hyunjin Han (invisibleroads) *	Date: 2010年11月29日 04:58
email.header.decode_header fails for the following message subject: :: email.header.decode_header('=?UTF-8?B?MjAxMSBBVVRNIENBTEwgZm9yIE5PTUlO?==?UTF-8?B?QVRJT05TIG9mIFZQIGZvciBNZW1iZXJz?==?UTF-8?B?aGlw?=') If the directives are removed and the padding problems are fixed, the subject parses correctly. :: email.header.decode_header('=?UTF-8?B?%s==?=' % '=?UTF-8?B?MjAxMSBBVVRNIENBTEwgZm9yIE5PTUlO?==?UTF-8?B?QVRJT05TIG9mIFZQIGZvciBNZW1iZXJz?==?UTF-8?B?aGlw?='.replace('=?UTF-8?B?', '').replace('?', '').replace('=', ''))
msg122773 - (view)	Author: Roy Hyunjin Han (invisibleroads) *	Date: 2010年11月29日 05:12
Currently using the following workaround. import re import email.header def decodeSafely(x): match = re.search('(=\?.*?\?B\?)', x) if not match: return x encoding = match.group(1) return email.header.decode_header('%s%s==?=' % (encoding, x.replace(encoding, '').replace('?', '').replace('=', '')))
msg122774 - (view)	Author: Roy Hyunjin Han (invisibleroads) *	Date: 2010年11月29日 05:59
Improved workaround to handle another degenerate case where the encoded string is in between non-encoded strings. import re import email.header pattern_ecre = re.compile(r'((=\?.?\?[qb]\?).\?=)', re.VERBOSE \| re.IGNORECASE \| re.MULTILINE) def decodeSafely(x): match = pattern_ecre.search(x) if not match: return x string, encoding = match.groups() stringBefore, string, stringAfter = x.partition(string) return stringBefore + email.header.decode_header('%s%s==?=' % (encoding, string.replace(encoding, '').replace('?', '').replace('=', '')))[0][0] + stringAfter print decodeSafely('=?UTF-8?B?MjAxMSBBVVRNIENBTEwgZm9yIE5PTUlO?==?UTF-8?B?QVRJT05TIG9mIFZQIGZvciBNZW1iZXJz?==?UTF-8?B?aGlw?=') print decodeSafely('"=?UTF-8?B?QVVUTSBIZWFkcXVhcnRlcnM=?="<info@autm.net>')
msg122776 - (view)	Author: Roy Hyunjin Han (invisibleroads) *	Date: 2010年11月29日 06:45
The following code seems to solve the first case just as well. It seems that it is a problem of missing whitespace. email.header.decode_header('=?UTF-8?B?MjAxMSBBVVRNIENBTEwgZm9yIE5PTUlO?==?UTF-8?B?QVRJT05TIG9mIFZQIGZvciBNZW1iZXJz?==?UTF-8?B?aGlw?='.replace('?==?', '?= =?'))
msg122918 - (view)	Author: R. David Murray (r.david.murray) * (Python committer)	Date: 2010年11月30日 17:05
Note that none of your examples are valid encoded words, so given that email currently does strict parsing, the fact that it is not attempting to decode those words is technically correct. However, I agree that it would be better for it to do a "best guess" decoding of the invalid encoded words. It should be possible to "fix" this case by simply replacing '?==?' with '?= =?' before decoding (blanks between encoded words are ignored when decoding, per the RFC, which the author of the package producing these invalid headers probably didn't realize). See also #1079 and #8132. I have to think about whether or not all of these can be considered fixes (based on Postel's law) or if tolerant parsing should be considered a feature request. I'll probably combine these into a single issue at some point. Out of curiosity, which email program is it that is producing these invalid headers?
msg126838 - (view)	Author: Roy Hyunjin Han (invisibleroads) *	Date: 2011年01月22日 14:31
2010年11月30日 R. David Murray <report@bugs.python.org>: > Out of curiosity, which email program is it that is producing these invalid headers? I lost the headers for the original email, so I don't know which email program created the invalid headers. On searching for messages from the same address, it seems most of the messages originate from a marketing company called informz.net, but in rare instances there is a non-standard X-Mailer header: - ColdFusion 8 Application Server (via JavaMail) - IBM Lotus Domino Access for MS Outlook (2003) Release 7.0.2 September 26, 2006 Messages sent via informz.net generally parse correctly, so I am guessing it might have been one of the X-Mailers above.
msg162237 - (view)	Author: R. David Murray (r.david.murray) * (Python committer)	Date: 2012年06月03日 22:11
This is fixed by the fix to issue 1079, but we have decided that fix can't be backported because it is a behavior change that might break existing working programs.
msg164640 - (view)	Author: Roy Hyunjin Han (invisibleroads) *	Date: 2012年07月04日 10:11
> This is fixed by the fix to issue 1079, but we have decided that fix can't be backported because it is a behavior change that might break existing working programs. Thanks for this update.

History
Date	User	Action	Args
2022年04月11日 14:57:09	admin	set	github: 54783
2012年07月04日 10:11:38	invisibleroads	set	messages: + msg164640
2012年06月03日 22:11:21	r.david.murray	set	status: open -> closed superseder: decode_header does not follow RFC 2047 messages: + msg162237 resolution: duplicate stage: resolved
2012年05月16日 02:00:06	r.david.murray	set	assignee: r.david.murray -> nosy: + barry components: + email versions: - Python 3.1
2011年03月14日 03:41:36	r.david.murray	set	versions: + Python 3.2, Python 3.3, - Python 2.6
2011年01月22日 14:31:54	invisibleroads	set	messages: + msg126838
2010年11月30日 17:05:53	r.david.murray	set	assignee: r.david.murray messages: + msg122918 nosy: + r.david.murray
2010年11月29日 06:45:58	invisibleroads	set	messages: + msg122776
2010年11月29日 05:59:32	invisibleroads	set	messages: + msg122774
2010年11月29日 05:12:24	invisibleroads	set	messages: + msg122773
2010年11月29日 04:58:57	invisibleroads	create

homepage