This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2008年12月02日 13:17 by maxua, last changed 2022年04月11日 14:56 by admin. This issue is now closed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| charset-utf8-alias.patch | maxua, 2008年12月02日 13:17 | |||
| email_accept_codec_aliases.patch | r.david.murray, 2010年06月03日 16:31 | review | ||
| Messages (13) | |||
|---|---|---|---|
| msg76738 - (view) | Author: (maxua) | Date: 2008年12月02日 13:17 | |
When using MIME email package you can specify "utf8" as the encoding. It
will be accepted but it is not rendered correctly in some MUA. E.g. Mac
OS X Mail.app doesn't display it properly while Google Gmail does.
It is confusing since Python itself happily understands both utf8 and utf-8.
The patch adds "utf8" as an alias to "utf-8" encoding which means user
won't need to think twice.
Test case:
from email.MIMEText import MIMEText
msg = MIMEText(u'\u043a\u0438\u0440\u0438\u043b\u0438\u0446\u0430')
msg.set_charset('utf8')
print msg.as_string()
|
|||
| msg85329 - (view) | Author: Tony Nelson (tony_nelson) | Date: 2009年04月03日 20:35 | |
This seems entirely reasonable, helpful, and in accord with the mapping of ascii to us-ascii. I recommend accepting this patch or a slightly fancier one that would also do "utf_8". There are pobably other encoding names with the same issue of being accepted by Python but not be understood by other email clients. This issue also affects 2.6.1 and 2.7trunk. I haven't checked 3.x. |
|||
| msg87254 - (view) | Author: Ben Gamari (bgamari) | Date: 2009年05月05日 16:55 | |
Has this patch been merged yet? |
|||
| msg102511 - (view) | Author: Shashwat Anand (l0nwlf) | Date: 2010年04月07日 02:28 | |
I tested it on python 2.5, 2.6, 2.7 trunk and 3.2 varying msg.set_charset(x) with x = 'utf8' and 'utf-8'
Here are the results. Apparantly python 2.x had issue with Test case and 3.2 passed but I guess it is unrelated with the issue.
07:35:40 l0nwlf-MBP:~ $ python2.5
Python 2.5.4 (r254:67916, Jul 7 2009, 23:51:24)
[GCC 4.2.1 (Apple Inc. build 5646)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from email.MIMEText import MIMEText
>>> msg = MIMEText(u'\u043a\u0438\u0440\u0438\u043b\u0438\u0446\u0430')
>>> msg.set_charset('utf8')
>>> print msg.as_string()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/System/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/email/message.py", line 131, in as_string
g.flatten(self, unixfrom=unixfrom)
File "/System/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/email/generator.py", line 84, in flatten
self._write(msg)
File "/System/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/email/generator.py", line 109, in _write
self._dispatch(msg)
File "/System/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/email/generator.py", line 135, in _dispatch
meth(msg)
File "/System/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/email/generator.py", line 178, in _handle_text
self._fp.write(payload)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-7: ordinal not in range(128)
>>> msg.set_charset('utf-8')
>>> print msg.as_string()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/System/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/email/message.py", line 131, in as_string
g.flatten(self, unixfrom=unixfrom)
File "/System/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/email/generator.py", line 84, in flatten
self._write(msg)
File "/System/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/email/generator.py", line 109, in _write
self._dispatch(msg)
File "/System/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/email/generator.py", line 135, in _dispatch
meth(msg)
File "/System/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/email/generator.py", line 178, in _handle_text
self._fp.write(payload)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-7: ordinal not in range(128)
07:36:17 l0nwlf-MBP:~ $ python2.6
Python 2.6.5 (r265:79063, Apr 6 2010, 21:34:21)
[GCC 4.2.1 (Apple Inc. build 5646) (dot 1)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from email.MIMEText import MIMEText
>>> msg = MIMEText(u'\u043a\u0438\u0440\u0438\u043b\u0438\u0446\u0430')
>>> msg.set_charset('utf8')
>>> print msg.as_string()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/email/message.py", line 135, in as_string
g.flatten(self, unixfrom=unixfrom)
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/email/generator.py", line 84, in flatten
self._write(msg)
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/email/generator.py", line 109, in _write
self._dispatch(msg)
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/email/generator.py", line 135, in _dispatch
meth(msg)
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/email/generator.py", line 178, in _handle_text
self._fp.write(payload)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-7: ordinal not in range(128)
>>> msg.set_charset('utf-8')
>>> print msg.as_string()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/email/message.py", line 135, in as_string
g.flatten(self, unixfrom=unixfrom)
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/email/generator.py", line 84, in flatten
self._write(msg)
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/email/generator.py", line 109, in _write
self._dispatch(msg)
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/email/generator.py", line 135, in _dispatch
meth(msg)
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/email/generator.py", line 178, in _handle_text
self._fp.write(payload)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-7: ordinal not in range(128)
07:36:37 l0nwlf-MBP:~ $ python2.7
Python 2.7a4+ (trunk:78750, Mar 7 2010, 08:09:00)
[GCC 4.2.1 (Apple Inc. build 5646) (dot 1)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from email.MIMEText import MIMEText
>>> msg = MIMEText(u'\u043a\u0438\u0440\u0438\u043b\u0438\u0446\u0430')
>>> msg.set_charset('utf8')
>>> print msg.as_string()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python2.7/email/message.py", line 135, in as_string
g.flatten(self, unixfrom=unixfrom)
File "/usr/local/lib/python2.7/email/generator.py", line 83, in flatten
self._write(msg)
File "/usr/local/lib/python2.7/email/generator.py", line 108, in _write
self._dispatch(msg)
File "/usr/local/lib/python2.7/email/generator.py", line 134, in _dispatch
meth(msg)
File "/usr/local/lib/python2.7/email/generator.py", line 180, in _handle_text
self._fp.write(payload)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-7: ordinal not in range(128)
>>> msg.set_charset('utf-8')
>>> print msg.as_string()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python2.7/email/message.py", line 135, in as_string
g.flatten(self, unixfrom=unixfrom)
File "/usr/local/lib/python2.7/email/generator.py", line 83, in flatten
self._write(msg)
File "/usr/local/lib/python2.7/email/generator.py", line 108, in _write
self._dispatch(msg)
File "/usr/local/lib/python2.7/email/generator.py", line 134, in _dispatch
meth(msg)
File "/usr/local/lib/python2.7/email/generator.py", line 180, in _handle_text
self._fp.write(payload)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-7: ordinal not in range(128)
07:37:06 l0nwlf-MBP:~ $ python3.2
Python 3.2a0 (py3k:79532, Apr 1 2010, 01:48:52)
[GCC 4.2.1 (Apple Inc. build 5646) (dot 1)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from email.mime.text import MIMEText
>>> msg = MIMEText('\u043a\u0438\u0440\u0438\u043b\u0438\u0446\u0430')
>>> msg.set_charset('utf8')
>>> print (msg.as_string())
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Content-Type: text/plain; charset="utf8"
кирилица
>>> msg.set_charset('utf-8')
>>> print (msg.as_string())
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Content-Type: text/plain; charset="utf-8"
кирилица
The aliasing of 'utf-8' into 'utf8' and may be 'utf_8' seems reasonable IMO however Test Case fails.
|
|||
| msg103007 - (view) | Author: Shashwat Anand (l0nwlf) | Date: 2010年04月13日 03:33 | |
MIMEText doesn't support unicode input. This was the reason OP Test case failed. For reference : http://bugs.python.org/issue1368247 |
|||
| msg106964 - (view) | Author: R. David Murray (r.david.murray) * (Python committer) | Date: 2010年06月03日 16:31 | |
For various reasons the email module has a table of character sets. What might be most effective would be for the email module to look a character set name up in the codecs module and find out the cannonical name of the character set, and then look that up in its table (ie: remove the aliases table from email completely, and instead depend on codecs to resolve the cannonical name). Unfortunately the codecs module does not recognize all of the aliases used by email, nor is there necessarily any guarantee that the two modules will agree on the proper cannonical name. The attached patch instead uses the codecs module as a fallback if the charset name does not appear in the email package's ALIASES or CHARSETS tables. It therefore makes both utf8 and utf_8 work, as well as all the other variants the codec module accepts. The unit test just tests 'utf8', since if that one works all the others should too. I'm tentatively reclassifying this as a bug rather than a feature request, since I think it is a reasonable expectation that email would support at least the same set of encoding names that the rest of Python does. |
|||
| msg106965 - (view) | Author: Éric Araujo (eric.araujo) * (Python committer) | Date: 2010年06月03日 16:36 | |
Idea: Import the aliases mapping from codecs and extend it with email-specific aliases. Alternate idea: Add email’s names to codecs. Side note: "charset" stands for "character encoding", not "character set". See <http://www.w3.org/International/questions/qa-what-is-encoding#what> |
|||
| msg107018 - (view) | Author: Marc-Andre Lemburg (lemburg) * (Python committer) | Date: 2010年06月04日 09:14 | |
R. David Murray wrote: > > R. David Murray <rdmurray@bitdance.com> added the comment: > > For various reasons the email module has a table of character sets. What might be most effective would be for the email module to look a character set name up in the codecs module and find out the cannonical name of the character set, and then look that up in its table (ie: remove the aliases table from email completely, and instead depend on codecs to resolve the cannonical name). Unfortunately the codecs module does not recognize all of the aliases used by email, nor is there necessarily any guarantee that the two modules will agree on the proper cannonical name. I think that the encodings package should be the only source of valid aliases and encoding names - after all, you wouldn't be able to process email content using names or aliases not appearing in the encodings package tables. If there are aliases missing, then we can add them there. If the email packages needs different canonical names, it can apply its own map on the canonical names returned by the encodings package. |
|||
| msg107071 - (view) | Author: R. David Murray (r.david.murray) * (Python committer) | Date: 2010年06月04日 15:45 | |
Mark, any objection to my putting this patch in now, and then we'll fix the aliases implementation in 3.2? |
|||
| msg107082 - (view) | Author: Marc-Andre Lemburg (lemburg) * (Python committer) | Date: 2010年06月04日 17:36 | |
R. David Murray wrote: > > R. David Murray <rdmurray@bitdance.com> added the comment: > > Mark, any objection to my putting this patch in now, and then we'll fix the aliases implementation in 3.2? No. Please open a new issue targeting Python 3.2 for this. Thanks, -- Marc-Andre Lemburg eGenix.com ________________________________________________________________________ 2010年07月19日: EuroPython 2010, Birmingham, UK 44 days to go ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ |
|||
| msg107091 - (view) | Author: R. David Murray (r.david.murray) * (Python committer) | Date: 2010年06月04日 19:54 | |
Patch committed to trunk in r81705. Leaving issue open pending porting to the other branches, but I've also opened issue 8898 to further change things so that codecs becomes the sole authority for aliases in 3.2. |
|||
| msg118042 - (view) | Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) | Date: 2010年10月05日 22:53 | |
David, can this issue be closed? |
|||
| msg118044 - (view) | Author: R. David Murray (r.david.murray) * (Python committer) | Date: 2010年10月05日 23:17 | |
Yes. Benjamin merged this to py3k in r82292. If someone wants to explain to me how to cherry pick the changeset into 3.1 I'd be happy to do it, otherwise I think I'm done with this one :) |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:56:41 | admin | set | github: 48737 |
| 2011年04月03日 17:58:05 | l0nwlf | set | nosy:
- l0nwlf |
| 2010年12月27日 17:04:58 | r.david.murray | unlink | issue1685453 dependencies |
| 2010年10月05日 23:17:10 | r.david.murray | set | status: open -> closed messages: + msg118044 stage: commit review -> resolved |
| 2010年10月05日 22:53:41 | amaury.forgeotdarc | set | nosy:
+ amaury.forgeotdarc messages: + msg118042 |
| 2010年06月04日 19:54:30 | r.david.murray | set | resolution: fixed messages: + msg107091 stage: patch review -> commit review |
| 2010年06月04日 17:36:46 | lemburg | set | messages: + msg107082 |
| 2010年06月04日 15:45:43 | r.david.murray | set | messages:
+ msg107071 versions: + Python 3.1, Python 3.2, - Python 2.5 |
| 2010年06月04日 09:14:51 | lemburg | set | nosy:
+ lemburg messages: + msg107018 |
| 2010年06月03日 16:36:33 | eric.araujo | set | nosy:
+ eric.araujo messages: + msg106965 |
| 2010年06月03日 16:32:28 | r.david.murray | set | assignee: r.david.murray type: enhancement -> behavior stage: patch review |
| 2010年06月03日 16:31:58 | r.david.murray | set | files: + email_accept_codec_aliases.patch |
| 2010年06月03日 16:31:21 | r.david.murray | set | nosy:
+ r.david.murray messages: + msg106964 |
| 2010年04月13日 03:33:42 | l0nwlf | set | messages: + msg103007 |
| 2010年04月07日 02:28:07 | l0nwlf | set | type: enhancement messages: + msg102511 nosy: + l0nwlf |
| 2009年05月05日 16:55:28 | bgamari | set | nosy:
+ bgamari messages: + msg87254 |
| 2009年04月03日 20:36:41 | tony_nelson | set | versions: + Python 2.6, Python 2.7 |
| 2009年04月03日 20:35:33 | tony_nelson | set | nosy:
+ barry, tony_nelson messages: + msg85329 |
| 2009年03月30日 22:56:23 | ajaksu2 | link | issue1685453 dependencies |
| 2009年01月29日 23:37:26 | vstinner | set | nosy: - vstinner |
| 2008年12月02日 13:54:22 | vstinner | set | nosy: + vstinner |
| 2008年12月02日 13:17:11 | maxua | create | |