homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: email.header.decode_header fails if the string contains multiple directives
Type: behavior Stage: resolved
Components: email, Library (Lib) Versions: Python 3.2, Python 3.3, Python 2.7
process
Status: closed Resolution: duplicate
Dependencies: Superseder: decode_header does not follow RFC 2047
View: 1079
Assigned To: Nosy List: barry, invisibleroads, r.david.murray
Priority: normal Keywords:

Created on 2010年11月29日 04:58 by invisibleroads, last changed 2022年04月11日 14:57 by admin. This issue is now closed.

Messages (8)
msg122772 - (view) Author: Roy Hyunjin Han (invisibleroads) * Date: 2010年11月29日 04:58
email.header.decode_header fails for the following message subject:
::
 email.header.decode_header('=?UTF-8?B?MjAxMSBBVVRNIENBTEwgZm9yIE5PTUlO?==?UTF-8?B?QVRJT05TIG9mIFZQIGZvciBNZW1iZXJz?==?UTF-8?B?aGlw?=')
If the directives are removed and the padding problems are fixed, the subject parses correctly.
::
 email.header.decode_header('=?UTF-8?B?%s==?=' % '=?UTF-8?B?MjAxMSBBVVRNIENBTEwgZm9yIE5PTUlO?==?UTF-8?B?QVRJT05TIG9mIFZQIGZvciBNZW1iZXJz?==?UTF-8?B?aGlw?='.replace('=?UTF-8?B?', '').replace('?', '').replace('=', ''))
msg122773 - (view) Author: Roy Hyunjin Han (invisibleroads) * Date: 2010年11月29日 05:12
Currently using the following workaround.
import re
import email.header
def decodeSafely(x):
 match = re.search('(=\?.*?\?B\?)', x)
 if not match:
 return x
 encoding = match.group(1)
 return email.header.decode_header('%s%s==?=' % (encoding, x.replace(encoding, '').replace('?', '').replace('=', '')))
msg122774 - (view) Author: Roy Hyunjin Han (invisibleroads) * Date: 2010年11月29日 05:59
Improved workaround to handle another degenerate case where the encoded string is in between non-encoded strings.
import re
import email.header
pattern_ecre = re.compile(r'((=\?.*?\?[qb]\?).*\?=)', re.VERBOSE | re.IGNORECASE | re.MULTILINE)
def decodeSafely(x):
 match = pattern_ecre.search(x)
 if not match:
 return x
 string, encoding = match.groups()
 stringBefore, string, stringAfter = x.partition(string)
 return stringBefore + email.header.decode_header('%s%s==?=' % (encoding, string.replace(encoding, '').replace('?', '').replace('=', '')))[0][0] + stringAfter
print decodeSafely('=?UTF-8?B?MjAxMSBBVVRNIENBTEwgZm9yIE5PTUlO?==?UTF-8?B?QVRJT05TIG9mIFZQIGZvciBNZW1iZXJz?==?UTF-8?B?aGlw?=')
print decodeSafely('"=?UTF-8?B?QVVUTSBIZWFkcXVhcnRlcnM=?="<info@autm.net>')
msg122776 - (view) Author: Roy Hyunjin Han (invisibleroads) * Date: 2010年11月29日 06:45
The following code seems to solve the first case just as well. It seems that it is a problem of missing whitespace.
email.header.decode_header('=?UTF-8?B?MjAxMSBBVVRNIENBTEwgZm9yIE5PTUlO?==?UTF-8?B?QVRJT05TIG9mIFZQIGZvciBNZW1iZXJz?==?UTF-8?B?aGlw?='.replace('?==?', '?= =?'))
msg122918 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2010年11月30日 17:05
Note that none of your examples are valid encoded words, so given that email currently does strict parsing, the fact that it is not attempting to decode those words is technically correct. 
However, I agree that it would be better for it to do a "best guess" decoding of the invalid encoded words.
It should be possible to "fix" this case by simply replacing '?==?' with '?= =?' before decoding (blanks between encoded words are ignored when decoding, per the RFC, which the author of the package producing these invalid headers probably didn't realize).
See also #1079 and #8132.
I have to think about whether or not all of these can be considered fixes (based on Postel's law) or if tolerant parsing should be considered a feature request. I'll probably combine these into a single issue at some point.
Out of curiosity, which email program is it that is producing these invalid headers?
msg126838 - (view) Author: Roy Hyunjin Han (invisibleroads) * Date: 2011年01月22日 14:31
2010年11月30日 R. David Murray <report@bugs.python.org>:
> Out of curiosity, which email program is it that is producing these invalid headers?
I lost the headers for the original email, so I don't know which email
program created the invalid headers.
On searching for messages from the same address, it seems most of the
messages originate from a marketing company called informz.net, but in
rare instances there is a non-standard X-Mailer header:
- ColdFusion 8 Application Server (via JavaMail)
- IBM Lotus Domino Access for MS Outlook (2003) Release 7.0.2 September 26, 2006
Messages sent via informz.net generally parse correctly, so I am
guessing it might have been one of the X-Mailers above.
msg162237 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2012年06月03日 22:11
This is fixed by the fix to issue 1079, but we have decided that fix can't be backported because it is a behavior change that might break existing working programs.
msg164640 - (view) Author: Roy Hyunjin Han (invisibleroads) * Date: 2012年07月04日 10:11
> This is fixed by the fix to issue 1079, but we have decided that fix can't be backported because it is a behavior change that might break existing working programs.
Thanks for this update.
History
Date User Action Args
2022年04月11日 14:57:09adminsetgithub: 54783
2012年07月04日 10:11:38invisibleroadssetmessages: + msg164640
2012年06月03日 22:11:21r.david.murraysetstatus: open -> closed
superseder: decode_header does not follow RFC 2047
messages: + msg162237

resolution: duplicate
stage: resolved
2012年05月16日 02:00:06r.david.murraysetassignee: r.david.murray ->

nosy: + barry
components: + email
versions: - Python 3.1
2011年03月14日 03:41:36r.david.murraysetversions: + Python 3.2, Python 3.3, - Python 2.6
2011年01月22日 14:31:54invisibleroadssetmessages: + msg126838
2010年11月30日 17:05:53r.david.murraysetassignee: r.david.murray

messages: + msg122918
nosy: + r.david.murray
2010年11月29日 06:45:58invisibleroadssetmessages: + msg122776
2010年11月29日 05:59:32invisibleroadssetmessages: + msg122774
2010年11月29日 05:12:24invisibleroadssetmessages: + msg122773
2010年11月29日 04:58:57invisibleroadscreate

AltStyle によって変換されたページ (->オリジナル) /