homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Add decode_header_as_string method to email.utils
Type: enhancement Stage: test needed
Components: Library (Lib) Versions: Python 3.3
process
Status: closed Resolution: rejected
Dependencies: 4661 Superseder:
Assigned To: r.david.murray Nosy List: barry, pitrou, r.david.murray
Priority: normal Keywords: patch

Created on 2009年06月18日 01:36 by r.david.murray, last changed 2022年04月11日 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
decode_header.patch r.david.murray, 2010年10月03日 04:18 review
decode_header.patch pitrou, 2010年10月03日 14:38 review
Messages (13)
msg89488 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2009年06月18日 01:36
decode_header only accepts str as input. If the input contains no
encoded words, the output is str (ie: the input string) and None. If it
does contain encoded words, the output is pairs of bytes plus str
charset identifiers (or None). This makes it difficult to use this
function to decode headers, since one has to test for the special case
of str output when there are no encoded words.
I think decode_header should take bytes as input, and output (bytes.
str) pairs consistently.
In any case, the documentation is wrong since it says it returns
strings. The example is also wrong, since the actual output is:
[(b'p\xf6stal', 'iso-8859-1')]
msg109900 - (view) Author: Mark Lawrence (BreamoreBoy) * Date: 2010年07月10日 17:42
Anyone got any comments to make on this? Should 2.7 also be included?
msg109926 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2010年07月10日 20:04
No, this is a 3.x only problem. And my main comment is that decode_headers ought to go away as an API :)
But I'll try to fix the inconsistent data types problem before 3.2.
msg117906 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2010年10月03日 04:18
Here is a patch that makes the output consistently (bytes, string) pairs. This is definitely a potential backward compatibility issue, but in general code which compensates for the old behavior should work fine with the new behavior, since it was always possible to get a (bytes, None) tuple back as a result, so most code should be handling that case correctly already.
IMO this change is nevertheless worthwhile; especially since if the patch in issue 4661 is accepted decode_header can be enhanced so that it will provide a way to obtain the bytes version of a header containing (RFC invalid) non-ASCII bytes.
Note that this breaks one of the tests in nttplib, so backward compatibility really is an issue, unfortunately. I think nttplib's use case can be satisfied via the issue 4661 patch coupled with the decode_header bytes-recovery enhancement.
msg117910 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010年10月03日 13:04
> I think nttplib's use case can be satisfied via the issue 4661 patch
> coupled with the decode_header bytes-recovery enhancement.
I don't really understand how that could.
nntplib needs to "decode" (in the decode_header sense) headers containing unescaped as well as escaped non-ASCII chars. Encoding with "ascii" clearly breaks the first use case.
Since the decode_header() API is so silly to begin with, I'd suggest providing another higher-level API instead. One which takes an str and returns another str (not bytes!) instead of that list of tuples.
If you really want to "fix" the current decode_header(), I'd recommend adding an optional "encoding" parameter so that nntplib can give utf-8 instead of ascii. nntplib and other consumers will still have to decode back in order to get an str, which makes the whole encoding thing a bit useless.
Oh, and instead of None, it would be nicer to give the actual encoding (e.g. "ascii"). It's no fun trying to guess.
msg117912 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010年10月03日 14:38
Here is a proposal for decode_header_as_string().
msg117913 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2010年10月03日 14:58
Yes, that was a late night post and as I was falling asleep I realized that I was wrong.
Certainly decode_header_as_string is a function most people using the email package will want and will re-implement in one form or another, so I think it is a good idea to add it. I will take a look at the patch later. And with such a function added we can leave decode_header alone for backward compatibility.
msg117956 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2010年10月04日 15:10
In email6, can we at least make tuple returning methods return namedtuples instead?
msg117957 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2010年10月04日 15:18
I agree that it makes sense to have consistent types in the output. As for whether to add a new method or fix the existing one, I'm a bit torn, but I'd probably opt for fixing the existing function rather than adding a new one, just because I think there are few Python 3 applications out there that are counting on the old behavior. (I could be wrong though ;).
msg117958 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010年10月04日 15:24
> I agree that it makes sense to have consistent types in the output.
> As for whether to add a new method or fix the existing one, I'm a bit
> torn, but I'd probably opt for fixing the existing function rather
> than adding a new one, just because I think there are few Python 3
> applications out there that are counting on the old behavior. (I
> could be wrong though ;).
The point of a new method is to return the header as a human-readable
string, rather than a list of tuples. It has added value besides leaving
the old method alone ;)
msg117959 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2010年10月04日 15:27
>The point of a new method is to return the header as a human-readable
>string, rather than a list of tuples. It has added value besides
>leaving the old method alone ;)
+1 then! :)
msg123889 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2010年12月13日 18:36
Drat, missed this one when I was reviewing my issues for feature requests because I didn't change the type :(
msg160791 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2012年05月16日 01:18
All of this is going to be fixed a different (better :) way in an upcoming patch that will add a new header parsing/folding policy to the email package.
For the record, the way to spell the "decode a header and return a string" function using the existing API is:
 str(make_header(decode_header(<someheader>)))
History
Date User Action Args
2022年04月11日 14:56:50adminsetgithub: 50551
2014年05月13日 13:11:37r.david.murraylinkissue21492 dependencies
2012年05月16日 01:18:13r.david.murraysetstatus: open -> closed
resolution: rejected
messages: + msg160791
2010年12月27日 17:04:58r.david.murrayunlinkissue1685453 dependencies
2010年12月13日 18:37:41r.david.murraysettype: behavior -> enhancement
title: email.header.decode_header data types are inconsistent and incorrectly documented -> Add decode_header_as_string method to email.utils
2010年12月13日 18:36:50r.david.murraysetnosy: - BreamoreBoy
2010年12月13日 18:36:30r.david.murraysetmessages: + msg123889
versions: + Python 3.3, - Python 3.2
2010年10月04日 15:27:37barrysetmessages: + msg117959
2010年10月04日 15:24:49pitrousetmessages: + msg117958
2010年10月04日 15:18:39barrysetmessages: + msg117957
2010年10月04日 15:10:41barrysetmessages: + msg117956
2010年10月03日 14:58:07r.david.murraysetmessages: + msg117913
2010年10月03日 14:38:21pitrousetfiles: + decode_header.patch

messages: + msg117912
2010年10月03日 13:04:28pitrousetnosy: + barry
messages: + msg117910
2010年10月03日 04:18:33r.david.murraysetversions: - Python 3.1
2010年10月03日 04:18:11r.david.murraysetfiles: + decode_header.patch

nosy: + pitrou
messages: + msg117906

dependencies: + email.parser: impossible to read messages encoded in a different encoding
keywords: + patch
2010年07月10日 20:04:33r.david.murraysetassignee: r.david.murray
messages: + msg109926
2010年07月10日 17:42:04BreamoreBoysetnosy: + BreamoreBoy
messages: + msg109900
2009年06月18日 01:37:46r.david.murraylinkissue1685453 dependencies
2009年06月18日 01:36:14r.david.murraycreate

AltStyle によって変換されたページ (->オリジナル) /