This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2014年05月13日 09:08 by jim_minter, last changed 2022年04月11日 14:58 by admin.
| Messages (5) | |||
|---|---|---|---|
| msg218419 - (view) | Author: Jim Minter (jim_minter) | Date: 2014年05月13日 09:08 | |
Python 3.3.2 (default, Mar 5 2014, 08:21:05)
[GCC 4.8.2 20131212 (Red Hat 4.8.2-7)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import email.header
>>> email.header.decode_header("foo")
[('foo', None)]
>>> email.header.decode_header("foo=?windows-1252?Q?bar?=")
[(b'foo', None), (b'bar', 'windows-1252')]
I may well be wrong, but I believe it's erroneous that in the second example above, b'foo' is returned instead of the expected 'foo'.
|
|||
| msg218442 - (view) | Author: Berker Peksag (berker.peksag) * (Python committer) | Date: 2014年05月13日 11:44 | |
> >>> email.header.decode_header("foo")
> [('foo', None)]
email.header.decode_header() implements rfc-2047 and the "foo" header doesn't match the syntax described in rfc-2047 (see "2. Syntax of encoded-words").
See the code for more information:
* http://hg.python.org/cpython/file/default/Lib/email/header.py#l34
* http://hg.python.org/cpython/file/default/Lib/email/header.py#l81
See the "6.1. Recognition of 'encoded-word's in message headers" section at http://www.rfc-base.org/txt/rfc-2047.txt
|
|||
| msg218450 - (view) | Author: R. David Murray (r.david.murray) * (Python committer) | Date: 2014年05月13日 13:11 | |
The error is actually the first case returning string rather than bytes. See issue 6302. |
|||
| msg218451 - (view) | Author: R. David Murray (r.david.murray) * (Python committer) | Date: 2014年05月13日 13:15 | |
Hmm. It looks like we decided that we couldn't fix the behavior for backward compatibility reasons. In 3.4 you can use the new email policies to get automatic, correct stringification of headers. |
|||
| msg344522 - (view) | Author: Ezio Melotti (ezio.melotti) * (Python committer) | Date: 2019年06月04日 04:20 | |
If we can't fix the behavior, it should at least be documented. Currently the docs says "This function returns a list of (decoded_string, charset) pairs containing each of the decoded parts of the header.". One could assume that this means that a Unicode string is returned, but and as far as I can tell, "decoded_string" means decoded from the format used by the header, not from bytes -- in fact the example below shows a byte string. #24797 suggest an alternative solution, but there is no indications about it in the docs except an easy-to-miss note about the new API at the top. Coincidentally as I was reporting this issue I also found the recently opened #37139. There are also a few other reports: #24797, #37139, #32975, #6302, #4661. If this method is not actually deprecated, I would document the current behavior (i.e. sometimes it returns bytes, sometimes unicode -- bonus points if there's a simple rule to predict which one), explain that it exists for legacy/backward-compatibility reasons, and point to the alternatives. FWIW here are 3 more samples that show the inconsistency. >>> from email.header import decode_header >>> # str + None >>> h = '\x80SOKCrGxsbw===== <hello@example.com>'; decode_header(h) [('\x80SOKCrGxsbw===== <hello@example.com>', None)] >>> # bytes + '', bytes + None >>> h = '=??b?SOKCrGxsbw=====?= <hello@example.com>'; decode_header(h) [(b'H\xe2\x82\xacllo', ''), (b' <hello@example.com>', None)] >>> # bytes + 'utf8', bytes + None >>> h = '=?utf8?b?SOKCrGxsbw==?= <hello@example.com>'; decode_header(h) [(b'H\xe2\x82\xacllo', 'utf8'), (b' <hello@example.com>', None)] |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:58:03 | admin | set | github: 65691 |
| 2019年06月07日 18:44:38 | terry.reedy | set | versions: + Python 3.7, Python 3.8, Python 3.9, - Python 3.3 |
| 2019年06月04日 04:20:03 | ezio.melotti | set | status: closed -> open type: behavior -> enhancement assignee: docs@python components: + Documentation nosy: + ezio.melotti, louis.abraham@yahoo.fr, docs@python messages: + msg344522 resolution: duplicate -> stage: resolved -> needs patch |
| 2015年08月05日 15:49:44 | r.david.murray | link | issue24797 superseder |
| 2014年05月13日 13:15:12 | r.david.murray | set | messages: + msg218451 |
| 2014年05月13日 13:11:38 | r.david.murray | set | status: open -> closed messages: + msg218450 dependencies: + Add decode_header_as_string method to email.utils resolution: duplicate stage: resolved |
| 2014年05月13日 11:44:48 | berker.peksag | set | nosy:
+ barry, berker.peksag, r.david.murray messages: + msg218442 components: + email |
| 2014年05月13日 09:08:05 | jim_minter | create | |