This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2018年08月23日 17:40 by izbyshev, last changed 2022年04月11日 14:59 by admin.
| Pull Requests | |||
|---|---|---|---|
| URL | Status | Linked | Edit |
| PR 8983 | closed | p-ganssle, 2018年08月28日 23:50 | |
| Messages (8) | |||
|---|---|---|---|
| msg323963 - (view) | Author: Alexey Izbyshev (izbyshev) * (Python triager) | Date: 2018年08月23日 17:40 | |
The C datetime implementation uses PyUnicode_AsUTF8AndSize() in wrap_strftime() and rejects strings containing surrogate code points (0xD800 - 0xDFFF) since they can't be encoded in UTF-8. On the other hand, the pure-Python datetime implementation doesn't have this restriction:
>>> import sys
>>> sys.modules['_datetime'] = None # block C implementation
>>> from datetime import time
>>> time().strftime('\ud800')
'\ud800'
>>> del sys.modules['datetime']
>>> del sys.modules['_datetime']
>>> from datetime import time
>>> time().strftime('\ud800')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'utf-8' codec can't encode character '\ud800' in position 0: surrogates not allowed
|
|||
| msg324329 - (view) | Author: Paul Ganssle (p-ganssle) * (Python committer) | Date: 2018年08月29日 16:04 | |
I'm finding it very difficult to reconcile these things. I'm not entirely sure, but we may need to take a performance hit in normal strftime if we want to make this work with surrogate characters, which really does not appeal to me (though we can certainly improve to some degree). One major question here: Is anyone (@vstinner, @belopolsky?) know why time's strftime opportunistically uses wcsftime instead of strftime? It makes the code *way* more complicated and difficult to read / maintain - are there platforms that provide wcstrftime and not strftime? Also, related, it seems according to https://bugs.python.org/issue10653#msg243660 that there may have been a regression in issue #10653, which may be related here. Either way, some note should probably be made in the code to clarify exactly *why* these choices were made in the code, in case the situation has changed. |
|||
| msg324336 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2018年08月29日 17:15 | |
Because caring of surrogates, I would prefer to first fix time.strftime() and time.strptime() for legit time zone names. Currently, Python uses the C function strfime() because the string produced by wcsftime() cannot be parsed later. We should use the native Windows API to get the time zone name properly encoded. I don't recall the details, sorry. There is maybe an open issue for that one. |
|||
| msg324337 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2018年08月29日 17:17 | |
> Because caring of surrogates, Before* |
|||
| msg324356 - (view) | Author: Alexey Izbyshev (izbyshev) * (Python triager) | Date: 2018年08月29日 23:05 | |
@p-ganssle > I'm finding it very difficult to reconcile these things. Paul, this issue was only about reconciling C and Python implementations of datetime module -- not fixing time.strftime(), which both of them delegate to. While the latter is certainly more important, it's also much more difficult, and IMO needs another issue report (maybe merged with #34512). As for the former, it seems that you've already fixed it in your PR. |
|||
| msg324397 - (view) | Author: Paul Ganssle (p-ganssle) * (Python committer) | Date: 2018年08月30日 18:08 | |
@izbyshev That's totally fair and I wouldn't want to make it a condition of merging the existing fixes - I've already made great progress in fixing the time.strftime part as well. The main reason it relates here is that I generally find the tests to be among the hardest part about writing a good PR, and if we can't make assertions about the behavior of strftime outputs, I think it makes it hard to prevent regressions. I figured if I can solve the problem all the way down the stack in one go, I might as well. That said, Victor makes an *extremely* good point that this is an outsized effort for the bug it's fixing. No one really *needs* support for unpaired surrogates in their strftime as far as I can tell. The main reason I'm still working on it is that I'm curious to see if it's even possible to fix. |
|||
| msg324401 - (view) | Author: Alexey Izbyshev (izbyshev) * (Python triager) | Date: 2018年08月30日 23:10 | |
> if we can't make assertions about the behavior of strftime outputs, I think it makes it hard to prevent regressions. Ironically, some of the changes we may have to make to fix time.strftime() inconsistencies may appear like regressions to some users. For example, regarding dropping wcsftime() on all platforms, 'man strftime' on macOS contains the following warning in BUGS section: "The strftime() function does not correctly handle multibyte characters in the format argument.". (I'm not sure what "incorrect handling" means here). That's not to discourage you, just to point out that we should be extra careful at this point when there might be users relying on every variety of the current behavior. And that's also why I don't want to mix the fix for C vs. Python issue (low risk) with changes in time.strftime() (high risk). Also, while I certainly agree with "the outsized effort" point, it seems that your PR goes far beyond supporting surrogates because falling back to escaping allows one to bypass checks in wcsftime()/strftime() and round-trip any code point regardless of the current locale. |
|||
| msg324630 - (view) | Author: Paul Ganssle (p-ganssle) * (Python committer) | Date: 2018年09月05日 12:29 | |
I've left a mostly finished PR on #8983, but I've decided it's not really worth continuing to work on. Anyone can feel free to take it and run with it if they want to implement the fix for this. As Alexey mentions, that PR indeed already fixes this bug, I mainly blocked on getting consistent behavior across all platforms. Currently that PR has consistent behavior across all platforms if you disable wcsftime *except* Ubuntu 14.0, which seems to have some kind of funky bug in wcstombs, though for the life of me I don't know how to consistently trigger it. I suspect that using strftime instead of wcsftime on all platforms would lead to simpler code that works most consistently, but I am guessing wcsftime is faster. Another option is to use wsftime but fall back to strftime on MacOS in the event that the error condition (blank output from non-blank input) is detected. |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:59:05 | admin | set | github: 78662 |
| 2018年09月05日 15:38:31 | rhettinger | set | nosy:
+ brett.cannon |
| 2018年09月05日 12:29:35 | p-ganssle | set | messages: + msg324630 |
| 2018年08月30日 23:10:49 | izbyshev | set | messages: + msg324401 |
| 2018年08月30日 18:08:40 | p-ganssle | set | messages: + msg324397 |
| 2018年08月29日 23:05:12 | izbyshev | set | messages: + msg324356 |
| 2018年08月29日 17:17:19 | vstinner | set | messages: + msg324337 |
| 2018年08月29日 17:15:17 | vstinner | set | messages: + msg324336 |
| 2018年08月29日 16:04:00 | p-ganssle | set | messages: + msg324329 |
| 2018年08月29日 00:01:51 | vstinner | set | nosy:
+ vstinner |
| 2018年08月28日 23:50:34 | p-ganssle | set | keywords:
+ patch stage: patch review pull_requests: + pull_request8457 |
| 2018年08月23日 18:36:03 | p-ganssle | set | nosy:
+ p-ganssle |
| 2018年08月23日 17:40:57 | izbyshev | create | |