homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: unicode in email.MIMEText and email/Charset.py
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.1, Python 3.2
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: r.david.murray Nosy List: barry, bgamari, gdamjan, l0nwlf, loewis, maxua, orsenthil, r.david.murray, vstinner
Priority: normal Keywords: patch

Created on 2005年11月28日 14:15 by gdamjan, last changed 2022年04月11日 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
Charset.patch gdamjan, 2005年11月28日 14:15
mimetext-unicode.patch maxua, 2008年12月02日 13:10 unicode mimetext support
mimetext_unicode_input.patch r.david.murray, 2010年06月01日 13:56
mimetext_unicode_input.patch r.david.murray, 2010年06月01日 14:05
Messages (13)
msg49137 - (view) Author: Damjan Georgievski (gdamjan) Date: 2005年11月28日 14:15
This is the test case that fails in python 2.4.1:
from email.MIMEText import MIMEText
msg =
MIMEText(u'\u043a\u0438\u0440\u0438\u043b\u0438\u0446\u0430')
msg.set_charset('utf-8')
msg.as_string()
And attached is a patch to correct it.
msg49138 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2007年03月05日 13:13
Your proposed patch doesn't seem to work in Python 2.5, or the trunk (i.e. it won't prevent an exception from occuring). Can you please revise it?
msg76737 - (view) Author: (maxua) Date: 2008年12月02日 13:10
How about this version?
msg76740 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2008年12月02日 13:53
It was proposed to rewrite MIMEText in Python 3.1 (and 2.7?) to use 
unicode characters in the internals and reconvert to bytes to send it 
to a socket (or a file).
msg76741 - (view) Author: Damjan Georgievski (gdamjan) Date: 2008年12月02日 13:56
The patch by maxua works fine with 2.6 too and solves the problem.
I'd suggest it be applied to the 2.6 branch, even if email is rewriten
for 2.7/3.x.
msg87253 - (view) Author: Ben Gamari (bgamari) Date: 2009年05月05日 16:52
What is the status of this?
msg87292 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2009年05月05日 21:54
It looks to me like MIMEText doesn't actually support unicode input. .
One way to get the example to work is to do this:
 MIMEText(u'\u043a\u0438\u0440\u0438\u043b\u0438\u0446\u0430'.encode('utf-8'), 'plain', 'utf-8')
The above call produces valid output from as_string:
'Content-Type: text/plain; charset="utf-8"\nMIME-Version:
1.0\nContent-Transfer-Encoding: base64\n\n0LrQuNGA0LjQu9C40YbQsA==\n'
How you'd get it to use 8bit, I have no idea. Still, I'm inclined to
close this as invalid unless Barry tells me my analysis is wrong.
(CF: http://mg.pov.lt/blog/unicode-emails-in-python for a good example
of handling unicode using the email package, which I found after
figuring out the above.)
Clearly, the documentation of this could be better, but I suspect the
developers would rather spend their time fixing the email module in py3.
 A doc patch would certainly be accepted. (Maybe someone could ask the
above blogger if we could borrow his example for the docs.)
msg103008 - (view) Author: Shashwat Anand (l0nwlf) Date: 2010年04月13日 03:33
After applying maxua's patch we do not get the unicode error but as david stated the support is not there. Here is the test.
>>> import email
>>> msg = email.MIMEText.MIMEText(u'\u043a\u0438\u0440\u0438\u043b\u0438\u0446\u0430')
>>> msg.set_charset('utf8')
>>> msg.as_string()
'MIME-Version: 1.0\nContent-Transfer-Encoding: 8bit\nContent-Type: text/plain; charset="utf8"\n\n\xd0\xba\xd0\xb8\xd1\x80\xd0\xb8\xd0\xbb\xd0\xb8\xd1\x86\xd0\xb0'
This does not seems a viable general solution to the problem.
I guess, this issue should be closed and emphasis should be now on development of 'email 6.0'. By the way I mailed Marius, the author of the blog-post http://mg.pov.lt/blog/unicode-emails-in-python , if I can borrow his example for the doc-patch.
msg104416 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2010年04月28日 07:29
Hi David,
The attached patch for this issue:
+ if isinstance(payload, unicode):
+ payload = payload.encode(msg.get_charset().output_charset or 'us-ascii')
looks fine enough to me. Are you worried about the /or 'us-ascii'/ part of this patch? 
IMHO, the patch may prevent the straight forward Exception for which the issue was raised.
But on a larger scale, it is advisable to document MIMEText usage wth encoding as you mentioned.
msg106835 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2010年06月01日 13:48
I don't think either of the previous patches are correct. I found a note in Issue1685453 that Barry would like for this to work, and after poking around in the code for a bit I think it can be done without breaking anything.
Attached is a patch that adds unicode support to MIMEText, including unit tests and docs updates. Note that it is necessary to specify a charset if you have non-ASCII text in your unicode string, since the default charset is us-ascii. The unit tests confirm this behavior.
Now the question is, is this a bug-fix or an enhancement? I *think* it is safe to apply and backport, since I think the only behavior it changes is to make unicode input work, whereas before it would give a traceback. But I've been wrong before :(
msg106837 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2010年06月01日 14:05
Ah, it's not 100% true that it doesn't change "working" behavior. Before the patch, the first example in this ticket doesn't raise an error until the as_string call. After this patch, the error is raised as soon as MIMEText is called without the charset parameter. Since without the patch the code still fails eventually, I think this is an acceptable behavior change for a bug fix, but it does make me a little nervous :)
Updated patch with doc change to Message.set_charset attached.
msg106924 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2010年06月02日 22:26
Committed to 2.7 in r81658 and 2.6 in r81659. I'm leaving this open for the moment because while py3 doesn't have this problem, the tests should still pass and they don't.
msg106930 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2010年06月03日 02:06
OK, py3k now passes all but one of the tests, and I've disabled that one pending email6 since fixing it would break backward compatibility within the 3.x series. The fix is different, doing the encoding to output_charset just before calling base64mime. This makes me think that quopri in py3k probably isn't working right, but that's a different issue. I did not forward port the doc changes as they are inappropriate for py3k.
py3k patched in r81660, 3.1 in r81661.
History
Date User Action Args
2022年04月11日 14:56:14adminsetgithub: 42634
2010年12月27日 17:04:58r.david.murrayunlinkissue1685453 dependencies
2010年06月03日 02:06:49r.david.murraysetstatus: open -> closed

messages: + msg106930
2010年06月02日 22:26:27r.david.murraysetstage: patch review -> resolved
messages: + msg106924
versions: - Python 2.6, Python 2.7
2010年06月01日 14:05:43r.david.murraysetfiles: + mimetext_unicode_input.patch

messages: + msg106837
2010年06月01日 13:56:50r.david.murraysetfiles: + mimetext_unicode_input.patch
2010年06月01日 13:56:36r.david.murraysetfiles: - mimetext_unicode_input.patch
2010年06月01日 13:48:46r.david.murraysetfiles: + mimetext_unicode_input.patch
versions: + Python 3.2, - Python 3.0
messages: + msg106835

resolution: not a bug -> fixed
stage: resolved -> patch review
2010年05月05日 13:33:01barrysetassignee: barry -> r.david.murray
2010年04月28日 07:29:08orsenthilsetnosy: + orsenthil
messages: + msg104416
2010年04月13日 03:33:46l0nwlfsetstatus: pending -> open
nosy: + l0nwlf
messages: + msg103008

2009年05月05日 21:54:33r.david.murraysetstatus: open -> pending

type: behavior
versions: + Python 2.6, Python 3.0, Python 3.1, Python 2.7
nosy: + r.david.murray

messages: + msg87292
resolution: not a bug
stage: resolved
2009年05月05日 16:52:43bgamarisetnosy: + bgamari
messages: + msg87253
2009年03月30日 22:56:23ajaksu2linkissue1685453 dependencies
2008年12月02日 13:56:57gdamjansetmessages: + msg76741
2008年12月02日 13:53:18vstinnersetmessages: + msg76740
2008年12月02日 13:52:17vstinnersetnosy: + vstinner
2008年12月02日 13:10:54maxuasetfiles: + mimetext-unicode.patch
nosy: + maxua
messages: + msg76737
2005年11月28日 14:15:40gdamjancreate

AltStyle によって変換されたページ (->オリジナル) /