Message 234731 - Python tracker

➜

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	serhiy.storchaka
Recipients	Arfrever, python-dev, serhiy.storchaka, vstinner
Date	2015年01月26日.10:26:41
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<129300265.9jKHJZWiYC@raxxla>
In-reply-to	<CAMpsgwYgAVXE=H=yELMJAjjgnyZ5-7tXaOkcBQLoUWoyoj9_uQ@mail.gmail.com>

Content
I think the changeset which made decoders to use _PyUnicodeWriter (issue16311) is responsible of the regression. For example consider b'\x80abc'.decode('utf-8', 'backslashreplace'). The writer reserves string buffer with size 4 (every byte produces at most 1 character). First byte is incorrect and replaced by 4-character string '\\x80'. The writer increases min_length but doesn't resize the buffer because its size is enough to write replacement string. But following writes of ASCII characters cause buffer overflow.

Content

I think the changeset which made decoders to use _PyUnicodeWriter (issue16311) 
is responsible of the regression.
For example consider b'\x80abc'.decode('utf-8', 'backslashreplace').
The writer reserves string buffer with size 4 (every byte produces at most 1 
character). First byte is incorrect and replaced by 4-character string 
'\\x80'. The writer increases min_length but doesn't resize the buffer because 
its size is enough to write replacement string. But following writes of ASCII 
characters cause buffer overflow.

History
Date	User	Action	Args
2015年01月26日 10:26:42	serhiy.storchaka	set	recipients: + serhiy.storchaka, vstinner, Arfrever, python-dev
2015年01月26日 10:26:42	serhiy.storchaka	link	issue23321 messages
2015年01月26日 10:26:41	serhiy.storchaka	create

homepage