Message 323844 - Python tracker

➜

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

In-reply-to
Author	izbyshev
Recipients	belopolsky, izbyshev, p-ganssle, serhiy.storchaka
Date	2018年08月21日.21:38:29
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1534887509.78.0.56676864532.issue34454@psf.upfronthosting.co.za>

Content
A failure of PyUnicode_AsUTF8AndSize() in various fromisoformat() functions in Modules/_datetimemodule.c leads to NULL dereference due to the missing check, e.g.: >>> from datetime import date >>> date.fromisoformat('\ud800') Segmentation fault (core dumped) This is similar to msg123474. The missing NULL check was reported by Svace static analyzer. While preparing tests for this issue, I've discovered a deeper problem. The C datetime implementation uses PyUnicode_AsUTF8AndSize() in several places, making some functions reject strings containing surrogate code points (0xD800 - 0xDFFF) since they can't be encoded in UTF-8. On the other hand, the pure-Python datetime implementation doesn't have this restriction. For example: >>> import sys >>> sys.modules['_datetime'] = None # block C implementation >>> from datetime import time >>> time().strftime('\ud800') '\ud800' >>> del sys.modules['datetime'] >>> del sys.modules['_datetime'] >>> from datetime import time >>> time().strftime('\ud800') Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeEncodeError: 'utf-8' codec can't encode character '\ud800' in position 0: surrogates not allowed My PR (coming soon) doesn't address this difference but focuses on fixing the immediate problem instead. Suggestions are appreciated.

Content

A failure of PyUnicode_AsUTF8AndSize() in various fromisoformat() functions in Modules/_datetimemodule.c leads to NULL dereference due to the missing check, e.g.:
>>> from datetime import date
>>> date.fromisoformat('\ud800')
Segmentation fault (core dumped)
This is similar to msg123474. The missing NULL check was reported by Svace static analyzer.
While preparing tests for this issue, I've discovered a deeper problem. The C datetime implementation uses PyUnicode_AsUTF8AndSize() in several places, making some functions reject strings containing surrogate code points (0xD800 - 0xDFFF) since they can't be encoded in UTF-8. On the other hand, the pure-Python datetime implementation doesn't have this restriction. For example:
>>> import sys
>>> sys.modules['_datetime'] = None # block C implementation
>>> from datetime import time
>>> time().strftime('\ud800')
'\ud800'
>>> del sys.modules['datetime']
>>> del sys.modules['_datetime']
>>> from datetime import time
>>> time().strftime('\ud800')
Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'utf-8' codec can't encode character '\ud800' in position 0: surrogates not allowed
My PR (coming soon) doesn't address this difference but focuses on fixing the immediate problem instead. Suggestions are appreciated.

History
Date	User	Action	Args
2018年08月21日 21:38:29	izbyshev	set	recipients: + izbyshev, belopolsky, serhiy.storchaka, p-ganssle
2018年08月21日 21:38:29	izbyshev	set	messageid: <1534887509.78.0.56676864532.issue34454@psf.upfronthosting.co.za>
2018年08月21日 21:38:29	izbyshev	link	issue34454 messages
2018年08月21日 21:38:29	izbyshev	create

homepage