Message 155656 - Python tracker

➜

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

In-reply-to
Author	r.david.murray
Recipients	aikinci, r.david.murray
Date	2012年03月13日.19:55:32
SpamBayes Score	4.4621227e-09
Marked as misclassified	No
Message-id	<1331668534.02.0.642658515735.issue14291@psf.upfronthosting.co.za>

Content
In Python2, this works: >>> from email.mime.text import MIMEText >>> m = MIMEText('abc') >>> str(m) 'From nobody Tue Mar 13 15:44:59 2012\nContent-Type: text/plain; charset="us-ascii"\nMIME-Version: 1.0\nContent-Transfer-Encoding: 7bit\n\nabc' >>> m['Subject'] = u'É test' >>> str(m) 'From nobody Tue Mar 13 15:48:11 2012\nContent-Type: text/plain; charset="us-ascii"\nMIME-Version: 1.0\nContent-Transfer-Encoding: 7bit\nSubject: =?utf-8?q?=C3=89_test?=\n\nabc' That is, unicode string automatically get turned into encoded words. In Python3 this no longer works: >>> from email.mime.text import MIMEText >>> m = MIMEText('abc') >>> str(m) 'Content-Type: text/plain; charset="us-ascii"\nMIME-Version: 1.0\nContent-Transfer-Encoding: 7bit\n\nabc' >>> m['Subject'] = u'É test' >>> str(m) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/rdmurray/python/p33/Lib/email/message.py", line 154, in __str__ return self.as_string() File "/home/rdmurray/python/p33/Lib/email/message.py", line 168, in as_string g.flatten(self, unixfrom=unixfrom) File "/home/rdmurray/python/p33/Lib/email/generator.py", line 99, in flatten self._write(msg) File "/home/rdmurray/python/p33/Lib/email/generator.py", line 152, in _write self._write_headers(msg) File "/home/rdmurray/python/p33/Lib/email/generator.py", line 186, in _write_headers header_name=h) File "/home/rdmurray/python/p33/Lib/email/header.py", line 205, in __init__ self.append(s, charset, errors) File "/home/rdmurray/python/p33/Lib/email/header.py", line 286, in append s.encode(output_charset, errors) UnicodeEncodeError: 'ascii' codec can't encode character '\xc9' in position 0: ordinal not in range(128) Presumably the problem is that the Python2 code tests for 'string' and if it isn't string handles it by CTE encoding it. In Python3 everything is a string. Probably what should happen is the encoding error should be caught, and the CTE encoding done at that point, based on the model of how Python2 handled unicode strings.

Content

In Python2, this works:
 >>> from email.mime.text import MIMEText
 >>> m = MIMEText('abc')
 >>> str(m)
 'From nobody Tue Mar 13 15:44:59 2012\nContent-Type: text/plain; charset="us-ascii"\nMIME-Version: 1.0\nContent-Transfer-Encoding: 7bit\n\nabc'
 >>> m['Subject'] = u'É test'
 >>> str(m)
 'From nobody Tue Mar 13 15:48:11 2012\nContent-Type: text/plain; charset="us-ascii"\nMIME-Version: 1.0\nContent-Transfer-Encoding: 7bit\nSubject: =?utf-8?q?=C3=89_test?=\n\nabc'
That is, unicode string automatically get turned into encoded words.
In Python3 this no longer works:
 >>> from email.mime.text import MIMEText
 >>> m = MIMEText('abc')
 >>> str(m)
 'Content-Type: text/plain; charset="us-ascii"\nMIME-Version: 1.0\nContent-Transfer-Encoding: 7bit\n\nabc'
 >>> m['Subject'] = u'É test'
 >>> str(m)
 Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
 File "/home/rdmurray/python/p33/Lib/email/message.py", line 154, in __str__
 return self.as_string()
 File "/home/rdmurray/python/p33/Lib/email/message.py", line 168, in as_string
 g.flatten(self, unixfrom=unixfrom)
 File "/home/rdmurray/python/p33/Lib/email/generator.py", line 99, in flatten
 self._write(msg)
 File "/home/rdmurray/python/p33/Lib/email/generator.py", line 152, in _write
 self._write_headers(msg)
 File "/home/rdmurray/python/p33/Lib/email/generator.py", line 186, in _write_headers
 header_name=h)
 File "/home/rdmurray/python/p33/Lib/email/header.py", line 205, in __init__
 self.append(s, charset, errors)
 File "/home/rdmurray/python/p33/Lib/email/header.py", line 286, in append
 s.encode(output_charset, errors)
 UnicodeEncodeError: 'ascii' codec can't encode character '\xc9' in position 0: ordinal not in range(128)
Presumably the problem is that the Python2 code tests for 'string' and if
it isn't string handles it by CTE encoding it. In Python3 everything
is a string. Probably what should happen is the encoding error should
be caught, and the CTE encoding done at that point, based on the model of how Python2 handled unicode strings.

History
Date	User	Action	Args
2012年03月13日 19:55:34	r.david.murray	set	recipients: + r.david.murray, aikinci
2012年03月13日 19:55:34	r.david.murray	set	messageid: <1331668534.02.0.642658515735.issue14291@psf.upfronthosting.co.za>
2012年03月13日 19:55:33	r.david.murray	link	issue14291 messages
2012年03月13日 19:55:32	r.david.murray	create

homepage