This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2018年08月26日 21:06 by izbyshev, last changed 2022年04月11日 14:59 by admin. This issue is now closed.
| Pull Requests | |||
|---|---|---|---|
| URL | Status | Linked | Edit |
| PR 8948 | merged | izbyshev, 2018年08月26日 21:09 | |
| PR 11535 | merged | miss-islington, 2019年01月12日 17:22 | |
| PR 11535 | merged | miss-islington, 2019年01月12日 17:22 | |
| Messages (5) | |||
|---|---|---|---|
| msg324136 - (view) | Author: Alexey Izbyshev (izbyshev) * (Python triager) | Date: 2018年08月26日 21:06 | |
If a format string contains code points outside of ASCII range, time.strftime() can behave in four different ways depending on the platform, the current locale and the code points:
* raise a UnicodeEncodeError
* return an empty string
* for surrogates in \uDC80-\uDCFF range, replace them with different code points in the output (potentially mangling nearby parts of the output as well)
* round-trip them correctly
Some examples:
* Linux (glibc 2.27):
Python 3.6.4 (default, Jan 03 2018, 13:52:55) [GCC] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import time, locale
>>> locale.getlocale()
('en_US', 'UTF-8')
>>> time.strftime('\x80')
'\x80'
>>> time.strftime('\u044f')
'я' # '\u044f'
>>> time.strftime('\ud800')
'\ud800'
>>> time.strftime('\udcff')
'\udcff'
>>> locale.setlocale(locale.LC_CTYPE, 'C')
'C'
>>> time.strftime('\x80')
'\x80'
>>> time.strftime('\u044f')
'я' # '\u044f'
>>> time.strftime('\ud800')
'\ud800'
>>> time.strftime('\udcff')
'\udcff'
* macOS 10.13.6 and FreeBSD 11.1:
Python 3.7.0 (default, Jul 23 2018, 20:22:55)
[Clang 9.1.0 (clang-902.0.39.2)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import time, locale
>>> locale.getlocale()
('en_US', 'UTF-8')
>>> time.strftime('\x80')
'\x80'
>>> time.strftime('\u044f')
'я' # '\u044f'
>>> time.strftime('\ud800')
''
>>> time.strftime('\udcff')
''
>>> locale.setlocale(locale.LC_CTYPE, 'C')
'C'
>>> time.strftime('\x80')
'\x80'
>>> time.strftime('\u044f')
''
>>> time.strftime('\ud800')
''
>>> time.strftime('\udcff')
''
* Windows 8.1:
Python 3.7.0 (v3.7.0:1bf9cc5093, Jun 27 2018, 04:59:51) [MSC v.1914 64 bit (AMD64)] on win32
>>> import time, locale
>>> locale.getlocale()
(None, None)
>>> time.strftime('\x80')
'\x80'
>>> time.strftime('\u044f')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'locale' codec can't encode character '\u044f' in position 0: encoding error
>>> time.strftime('\ud800')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'locale' codec can't encode character '\ud800' in position 0: encoding error
>>> time.strftime('\udcff')
'y' # '\xff'
>>> locale.setlocale(locale.LC_CTYPE, '')
'Russian_Russia.1251'
>>> time.strftime('\x80')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'locale' codec can't encode character '\x80' in position 0: encoding error
>>> time.strftime('\u044f')
'я' # '\u044f'
>>> time.strftime('\ud800')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'locale' codec can't encode character '\ud800' in position 0: encoding error
>>> time.strftime('\udcff')
'я' # '\u044f'
The reasons of such differences are the following:
* Reliance on either wcsftime() or strftime() from the C library depending on the platform.
* For strftime(), the input is encoded into the charset of the current locale with 'surrogateescape' error handler, and the output is decoded back in the same way.
* Different handling of code points which are unrepresentable in the charset of the current locale by glibc and macOS/FreeBSD.
I suggest to at least document that the format string, despite being an 'str', requires special care if it contains non-ASCII code points.
The 'datetime' module docs warn about the locale-dependent output, but only with regard to particular format specifiers [1].
I'll submit a draft PR. Suggestions are welcome.
[1] https://docs.python.org/3.7/library/datetime.html#strftime-and-strptime-behavior
|
|||
| msg333527 - (view) | Author: Tal Einat (taleinat) * (Python committer) | Date: 2019年01月12日 17:21 | |
New changeset 1cffd0eed313011c0c2bb071c8affeb4a7ed05c7 by Tal Einat (Alexey Izbyshev) in branch 'master': bpo-34512: Document platform-specific strftime() behavior for non-ASCII format strings (GH-8948) https://github.com/python/cpython/commit/1cffd0eed313011c0c2bb071c8affeb4a7ed05c7 |
|||
| msg333528 - (view) | Author: miss-islington (miss-islington) | Date: 2019年01月12日 17:27 | |
New changeset 678c5c07521caca809b1356d954975e6234c49ae by Miss Islington (bot) in branch '3.7': bpo-34512: Document platform-specific strftime() behavior for non-ASCII format strings (GH-8948) https://github.com/python/cpython/commit/678c5c07521caca809b1356d954975e6234c49ae |
|||
| msg333529 - (view) | Author: miss-islington (miss-islington) | Date: 2019年01月12日 17:28 | |
New changeset 77b80c956f39df34722bd8646cf5b83d149832c4 by Miss Islington (bot) in branch '2.7': bpo-34512: Document platform-specific strftime() behavior for non-ASCII format strings (GH-8948) https://github.com/python/cpython/commit/77b80c956f39df34722bd8646cf5b83d149832c4 |
|||
| msg333598 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2019年01月14日 10:20 | |
A solution to make time.strftime() more portable would be to split the format string, format each "%xxx" substring separately but don't pass substrings between "%xxx" to strftime(). There is a similar discussion about trailing "%": bpo-35066. |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:59:05 | admin | set | github: 78693 |
| 2019年01月14日 10:20:29 | vstinner | set | nosy:
+ vstinner messages: + msg333598 |
| 2019年01月12日 17:48:25 | taleinat | set | status: open -> closed stage: patch review -> resolved resolution: fixed versions: - Python 3.5, Python 3.6 |
| 2019年01月12日 17:47:19 | taleinat | set | pull_requests: - pull_request11139 |
| 2019年01月12日 17:47:05 | taleinat | set | pull_requests: - pull_request11137 |
| 2019年01月12日 17:46:50 | taleinat | set | pull_requests: - pull_request11135 |
| 2019年01月12日 17:28:08 | miss-islington | set | messages: + msg333529 |
| 2019年01月12日 17:27:32 | miss-islington | set | nosy:
+ miss-islington messages: + msg333528 |
| 2019年01月12日 17:23:19 | miss-islington | set | pull_requests: + pull_request11139 |
| 2019年01月12日 17:23:05 | miss-islington | set | pull_requests: + pull_request11137 |
| 2019年01月12日 17:22:52 | miss-islington | set | pull_requests: + pull_request11136 |
| 2019年01月12日 17:22:39 | miss-islington | set | pull_requests: + pull_request11135 |
| 2019年01月12日 17:22:25 | miss-islington | set | pull_requests: + pull_request11134 |
| 2019年01月12日 17:21:57 | taleinat | set | messages: + msg333527 |
| 2018年09月27日 06:55:41 | xtreak | set | nosy:
+ xtreak |
| 2018年09月27日 06:47:55 | taleinat | set | versions: + Python 2.7, Python 3.5 |
| 2018年08月26日 21:09:52 | izbyshev | set | keywords:
+ patch stage: patch review pull_requests: + pull_request8424 |
| 2018年08月26日 21:06:22 | izbyshev | create | |