Message 232850 - Python tracker

➜

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

In-reply-to
Author	martin.panter
Recipients	doerwalter, lemburg, loewis, martin.panter, ncoghlan, vstinner
Date	2014年12月18日.02:25:22
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1418869522.73.0.330533347223.issue20132@psf.upfronthosting.co.za>

Content
The "unicode-escape" and "utf-7" cases affect the more widely-used TextIOWrapper interface: >>> TextIOWrapper(BytesIO(br"\u2013" * 2000), "unicode-escape").read(1) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/lib/python3.4/encodings/unicode_escape.py", line 26, in decode return codecs.unicode_escape_decode(input, self.errors)[0] UnicodeDecodeError: 'unicodeescape' codec can't decode bytes in position 8190-8191: truncated \uXXXX escape >>> w = TextIOWrapper(BytesIO(), "utf-7") >>> w.writelines("\xA9\xA9") # Write one character at a time >>> w.detach().getvalue() b'+AKk-+AKk-' >>> r = TextIOWrapper(BytesIO(b"+" + b"AAAAAAAA" * 100000 + b"-"), "utf-7") >>> r.read(1) # Short delay as all 800 kB are decoded to read one character '\x00' >>> r.buffer.tell() 800002 For UTF-7 decoding to work optimally I think the amount of data buffering necessary would be limited to only a few bytes.

Content

The "unicode-escape" and "utf-7" cases affect the more widely-used TextIOWrapper interface:
>>> TextIOWrapper(BytesIO(br"\u2013" * 2000), "unicode-escape").read(1)
Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
 File "/usr/lib/python3.4/encodings/unicode_escape.py", line 26, in decode
 return codecs.unicode_escape_decode(input, self.errors)[0]
UnicodeDecodeError: 'unicodeescape' codec can't decode bytes in position 8190-8191: truncated \uXXXX escape
>>> w = TextIOWrapper(BytesIO(), "utf-7")
>>> w.writelines("\xA9\xA9") # Write one character at a time
>>> w.detach().getvalue()
b'+AKk-+AKk-'
>>> r = TextIOWrapper(BytesIO(b"+" + b"AAAAAAAA" * 100000 + b"-"), "utf-7")
>>> r.read(1) # Short delay as all 800 kB are decoded to read one character
'\x00'
>>> r.buffer.tell()
800002
For UTF-7 decoding to work optimally I think the amount of data buffering necessary would be limited to only a few bytes.

History
Date	User	Action	Args
2014年12月18日 02:25:22	martin.panter	set	recipients: + martin.panter, lemburg, loewis, doerwalter, ncoghlan, vstinner
2014年12月18日 02:25:22	martin.panter	set	messageid: <1418869522.73.0.330533347223.issue20132@psf.upfronthosting.co.za>
2014年12月18日 02:25:22	martin.panter	link	issue20132 messages
2014年12月18日 02:25:22	martin.panter	create

homepage