This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2010年11月29日 04:58 by invisibleroads, last changed 2022年04月11日 14:57 by admin. This issue is now closed.
| Messages (8) | |||
|---|---|---|---|
| msg122772 - (view) | Author: Roy Hyunjin Han (invisibleroads) * | Date: 2010年11月29日 04:58 | |
email.header.decode_header fails for the following message subject:
::
email.header.decode_header('=?UTF-8?B?MjAxMSBBVVRNIENBTEwgZm9yIE5PTUlO?==?UTF-8?B?QVRJT05TIG9mIFZQIGZvciBNZW1iZXJz?==?UTF-8?B?aGlw?=')
If the directives are removed and the padding problems are fixed, the subject parses correctly.
::
email.header.decode_header('=?UTF-8?B?%s==?=' % '=?UTF-8?B?MjAxMSBBVVRNIENBTEwgZm9yIE5PTUlO?==?UTF-8?B?QVRJT05TIG9mIFZQIGZvciBNZW1iZXJz?==?UTF-8?B?aGlw?='.replace('=?UTF-8?B?', '').replace('?', '').replace('=', ''))
|
|||
| msg122773 - (view) | Author: Roy Hyunjin Han (invisibleroads) * | Date: 2010年11月29日 05:12 | |
Currently using the following workaround.
import re
import email.header
def decodeSafely(x):
match = re.search('(=\?.*?\?B\?)', x)
if not match:
return x
encoding = match.group(1)
return email.header.decode_header('%s%s==?=' % (encoding, x.replace(encoding, '').replace('?', '').replace('=', '')))
|
|||
| msg122774 - (view) | Author: Roy Hyunjin Han (invisibleroads) * | Date: 2010年11月29日 05:59 | |
Improved workaround to handle another degenerate case where the encoded string is in between non-encoded strings.
import re
import email.header
pattern_ecre = re.compile(r'((=\?.*?\?[qb]\?).*\?=)', re.VERBOSE | re.IGNORECASE | re.MULTILINE)
def decodeSafely(x):
match = pattern_ecre.search(x)
if not match:
return x
string, encoding = match.groups()
stringBefore, string, stringAfter = x.partition(string)
return stringBefore + email.header.decode_header('%s%s==?=' % (encoding, string.replace(encoding, '').replace('?', '').replace('=', '')))[0][0] + stringAfter
print decodeSafely('=?UTF-8?B?MjAxMSBBVVRNIENBTEwgZm9yIE5PTUlO?==?UTF-8?B?QVRJT05TIG9mIFZQIGZvciBNZW1iZXJz?==?UTF-8?B?aGlw?=')
print decodeSafely('"=?UTF-8?B?QVVUTSBIZWFkcXVhcnRlcnM=?="<info@autm.net>')
|
|||
| msg122776 - (view) | Author: Roy Hyunjin Han (invisibleroads) * | Date: 2010年11月29日 06:45 | |
The following code seems to solve the first case just as well. It seems that it is a problem of missing whitespace.
email.header.decode_header('=?UTF-8?B?MjAxMSBBVVRNIENBTEwgZm9yIE5PTUlO?==?UTF-8?B?QVRJT05TIG9mIFZQIGZvciBNZW1iZXJz?==?UTF-8?B?aGlw?='.replace('?==?', '?= =?'))
|
|||
| msg122918 - (view) | Author: R. David Murray (r.david.murray) * (Python committer) | Date: 2010年11月30日 17:05 | |
Note that none of your examples are valid encoded words, so given that email currently does strict parsing, the fact that it is not attempting to decode those words is technically correct. However, I agree that it would be better for it to do a "best guess" decoding of the invalid encoded words. It should be possible to "fix" this case by simply replacing '?==?' with '?= =?' before decoding (blanks between encoded words are ignored when decoding, per the RFC, which the author of the package producing these invalid headers probably didn't realize). See also #1079 and #8132. I have to think about whether or not all of these can be considered fixes (based on Postel's law) or if tolerant parsing should be considered a feature request. I'll probably combine these into a single issue at some point. Out of curiosity, which email program is it that is producing these invalid headers? |
|||
| msg126838 - (view) | Author: Roy Hyunjin Han (invisibleroads) * | Date: 2011年01月22日 14:31 | |
2010年11月30日 R. David Murray <report@bugs.python.org>: > Out of curiosity, which email program is it that is producing these invalid headers? I lost the headers for the original email, so I don't know which email program created the invalid headers. On searching for messages from the same address, it seems most of the messages originate from a marketing company called informz.net, but in rare instances there is a non-standard X-Mailer header: - ColdFusion 8 Application Server (via JavaMail) - IBM Lotus Domino Access for MS Outlook (2003) Release 7.0.2 September 26, 2006 Messages sent via informz.net generally parse correctly, so I am guessing it might have been one of the X-Mailers above. |
|||
| msg162237 - (view) | Author: R. David Murray (r.david.murray) * (Python committer) | Date: 2012年06月03日 22:11 | |
This is fixed by the fix to issue 1079, but we have decided that fix can't be backported because it is a behavior change that might break existing working programs. |
|||
| msg164640 - (view) | Author: Roy Hyunjin Han (invisibleroads) * | Date: 2012年07月04日 10:11 | |
> This is fixed by the fix to issue 1079, but we have decided that fix can't be backported because it is a behavior change that might break existing working programs. Thanks for this update. |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:57:09 | admin | set | github: 54783 |
| 2012年07月04日 10:11:38 | invisibleroads | set | messages: + msg164640 |
| 2012年06月03日 22:11:21 | r.david.murray | set | status: open -> closed superseder: decode_header does not follow RFC 2047 messages: + msg162237 resolution: duplicate stage: resolved |
| 2012年05月16日 02:00:06 | r.david.murray | set | assignee: r.david.murray -> nosy: + barry components: + email versions: - Python 3.1 |
| 2011年03月14日 03:41:36 | r.david.murray | set | versions: + Python 3.2, Python 3.3, - Python 2.6 |
| 2011年01月22日 14:31:54 | invisibleroads | set | messages: + msg126838 |
| 2010年11月30日 17:05:53 | r.david.murray | set | assignee: r.david.murray messages: + msg122918 nosy: + r.david.murray |
| 2010年11月29日 06:45:58 | invisibleroads | set | messages: + msg122776 |
| 2010年11月29日 05:59:32 | invisibleroads | set | messages: + msg122774 |
| 2010年11月29日 05:12:24 | invisibleroads | set | messages: + msg122773 |
| 2010年11月29日 04:58:57 | invisibleroads | create | |