homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: email package and Unicode strings handling
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 2.6
process
Status: closed Resolution: duplicate
Dependencies: Superseder:
Assigned To: r.david.murray Nosy List: ajaksu2, barry, bgamari, manlioperillo, r.david.murray
Priority: normal Keywords:

Created on 2006年09月10日 16:04 by manlioperillo, last changed 2022年04月11日 14:56 by admin. This issue is now closed.

Messages (4)
msg29793 - (view) Author: Manlio Perillo (manlioperillo) Date: 2006年09月10日 16:04
The support for Unicode strings in the email package
(notably MIMEText and Header class) is not uniform.
The behaviour with Unicode strings in Header is
documented but the interface is not good.
This code works, but it should not:
>>> h = Header.Header(u"àèìòù", charset="us-ascii")
>>> m = Message.Message()
>>> m["Subject"] = h
>>> print m.as_string()
Allowing this to work can cause confusion, I'm saying
that the charset is us-ascii, not utf-8.
With MIMEText I obtain:
m = MIMEText.MIMEText(u"àèìòù", _charset="us-ascii")
>>> print m.as_string()
[ exception ]
I think that the correct behaviour (for all functions
accepting strings) is:
- Do not accept plain str strings (8-bit).
 Accept only if they are plain ascii (7-bit).
- The charset specified should not be considered an 
 hint, but the charset I want to be used.
Regards Manlio Perillo
msg29794 - (view) Author: Manlio Perillo (manlioperillo) Date: 2006年09月10日 17:35
Logged In: YES 
user_id=1054957
The last example is not right.
Here is the correct one:
 >>> m = MIMEText.MIMEText(u"àèìòù", _charset="utf-8")
 
Traceback (most recent call last):
 File "<stdin>", line 1, in ?
 File "C:\Python2.4\lib\email\MIMEText.py", line 28, in
__init__
 self.set_payload(_text, _charset)
 File "C:\Python2.4\lib\email\Message.py", line 218, in
set_payload
 self.set_charset(charset)
 File "C:\Python2.4\lib\email\Message.py", line 260, in
set_charset
 self._payload = charset.body_encode(self._payload)
 File "C:\Python2.4\lib\email\Charset.py", line 366, in
body_encode
 return email.base64MIME.body_encode(s)
 File "C:\Python2.4\lib\email\base64MIME.py", line 136, in
encode
 enc = b2a_base64(s[i:i + max_unencoded])
UnicodeEncodeError: 'ascii' codec can't encode characters in
position 0-2: ordinal not in range(128)
So it seems that email.Message does not handle Unicode strings.
The code works if I set the charset to latin-1.
msg84471 - (view) Author: Daniel Diniz (ajaksu2) * (Python triager) Date: 2009年03月30日 03:08
Confirmed on trunk.
msg106873 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2010年06月02日 02:18
It took me a while to figure out why latin-1 works. I turns out to be an accident: latin-1 uses quoted-printable encoding, and the email quoprimime module accidentally manages to quote unicode characters in the latin-1 range.
The Header example, as noted by the OP, is working as documented. This confusing interface isn't going to get fixed in the current email package. The equivalent email6 API will be cleaner.
The MIMEText portion is a duplicate of issue 1368247.
History
Date User Action Args
2022年04月11日 14:56:20adminsetgithub: 43960
2010年12月27日 17:04:58r.david.murrayunlinkissue1685453 dependencies
2010年06月02日 02:18:53r.david.murraysetstatus: open -> closed
resolution: duplicate
messages: + msg106873

stage: test needed -> resolved
2010年05月05日 13:41:25barrysetassignee: barry -> r.david.murray

nosy: + r.david.murray
2009年05月01日 16:00:24bgamarisetnosy: + bgamari
2009年03月30日 22:56:23ajaksu2linkissue1685453 dependencies
2009年03月30日 03:08:15ajaksu2setversions: + Python 2.6
nosy: + ajaksu2

messages: + msg84471

type: behavior
stage: test needed
2006年09月10日 16:04:26manlioperillocreate

AltStyle によって変換されたページ (->オリジナル) /