homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: MIMEText should default to utf8 charset if input text contains non-ASCII
Type: enhancement Stage: resolved
Components: email Versions: Python 3.3
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: r.david.murray Nosy List: barry, jeffknupp, mitya57, python-dev, r.david.murray
Priority: normal Keywords: easy, patch

Created on 2012年03月21日 17:34 by r.david.murray, last changed 2022年04月11日 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
mailutf8.patch jeffknupp, 2012年03月22日 18:43 review
mimeutf8.patch jeffknupp, 2012年03月22日 19:37 review
Messages (10)
msg156501 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2012年03月21日 17:34
The MIMEText class of the email package in Python3 requires that a character set be specified in order for the resulting email to be valid. If no character set is specified, it currently assumes ascii but puts a unicode payload in the message. Because someone might actually be making use of this bug, I'm not going to backport a fix, but I do want to make utf8 the default if there are non-ascii characters in the input, in a similar fashion to how it is done for headers (well, now that *that* bug has been fixed).
msg156603 - (view) Author: Jeff Knupp (jeffknupp) * Date: 2012年03月22日 18:42
Patch to default character set to utf8 if non-ascii characters are found in the input string. Please let me know if this is what you had in mind.
msg156605 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2012年03月22日 18:57
Pretty close. I'd do the check for us_ascii first, and only do the encode test/switch to utf-8 if that's the charset. The reason is that that if a charset has been specified, we don't waste time doing an unnecessary encoding (and the ascii codec is very fast, which you can't say about all the codecs).
Now, what would be *really* nice is to also try latin-1 before falling back to utf-8, but I wouldn't want to make that the default behavior for performance reasons. I'm planning to add support for that at some point, but I haven't decided exactly how (policy setting? New optional setting in the alias structure?)
There seem to be unrelated changes to torture_test in your patch?
msg156609 - (view) Author: Jeff Knupp (jeffknupp) * Date: 2012年03月22日 19:37
Understood. Please take a look at the updated patch. Also, removed the unrelated diffs that somehow were hanging around in my hg mq.
msg156612 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2012年03月22日 20:13
That looks good.
msg156630 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2012年03月23日 02:18
New changeset bfd1ba2bbaf8 by R David Murray in branch 'default':
#14380: Have MIMEText defaults to utf-8 when passed non-ASCII unicode
http://hg.python.org/cpython/rev/bfd1ba2bbaf8 
msg156631 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2012年03月23日 02:19
See also issue 7304. It was niggling at the back of my brain, and I finally managed to dig it up. Fixing that is much more complex than fixing this (because set_charset is a *very* strange method), so I committed this patch in case we don't manage to fix 7304 before 3.3 is ready, but with a comment about removing it when 7304 is fixed.
Thanks, Jeff.
msg156632 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2012年03月23日 02:41
New changeset 9ceac471bd8c by R David Murray in branch 'default':
#14380: Make actual default match docs, fix __init__ order.
http://hg.python.org/cpython/rev/9ceac471bd8c 
msg162131 - (view) Author: Dmitry Shachnev (mitya57) * Date: 2012年06月02日 10:11
Maybe it'll be better to use 'latin-1' charset for latin-1 texts?
Something like this:
if _charset == 'us-ascii':
 try:
 _text.encode(_charset)
 except UnicodeEncodeError:
 try:
 _text.encode('latin-1')
 except UnicodeEncodeError:
 _charset = 'utf-8'
 else:
 _charset = 'latin-1'
This will make messages in most latin languages use quoted-printable encoding.
msg162139 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2012年06月02日 14:59
I do plan to add something like that at some point. You could open a new issue for it if you like, and propose a formal patch.
History
Date User Action Args
2022年04月11日 14:57:28adminsetgithub: 58588
2012年06月02日 14:59:49r.david.murraysetmessages: + msg162139
2012年06月02日 10:11:00mitya57setnosy: + barry
messages: + msg162131
components: + email, - Library (Lib)
2012年03月23日 02:41:20python-devsetmessages: + msg156632
2012年03月23日 02:19:29r.david.murraysetstatus: open -> closed
resolution: fixed
messages: + msg156631

stage: commit review -> resolved
2012年03月23日 02:18:03python-devsetnosy: + python-dev
messages: + msg156630
2012年03月22日 20:13:43r.david.murraysetmessages: + msg156612
stage: test needed -> commit review
2012年03月22日 19:37:24jeffknuppsetfiles: + mimeutf8.patch

messages: + msg156609
2012年03月22日 18:57:05r.david.murraysetmessages: + msg156605
2012年03月22日 18:43:15jeffknuppsetfiles: + mailutf8.patch
keywords: + patch
2012年03月22日 18:42:50jeffknuppsetnosy: + jeffknupp
messages: + msg156603
2012年03月21日 17:34:47r.david.murraycreate

AltStyle によって変換されたページ (->オリジナル) /