This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2017年02月15日 17:29 by jaysinh.shukla, last changed 2022年04月11日 14:58 by admin. This issue is now closed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| test_re_locale_flag.patch | serhiy.storchaka, 2017年02月15日 22:57 | review | ||
| loc.py | vstinner, 2019年03月05日 12:34 | |||
| loc.log | vstinner, 2019年03月05日 12:34 | |||
| _testcapi.patch | vstinner, 2019年03月05日 12:34 | |||
| Pull Requests | |||
|---|---|---|---|
| URL | Status | Linked | Edit |
| PR 149 | merged | ncoghlan, 2017年02月18日 09:19 | |
| PR 153 | merged | ncoghlan, 2017年02月18日 10:44 | |
| PR 154 | merged | ncoghlan, 2017年02月18日 10:45 | |
| PR 422 | benjamin.peterson, 2017年03月03日 07:51 | ||
| PR 554 | merged | benjamin.peterson, 2017年03月08日 06:07 | |
| PR 555 | merged | benjamin.peterson, 2017年03月08日 06:51 | |
| PR 556 | merged | benjamin.peterson, 2017年03月08日 06:51 | |
| PR 2686 | closed | Naman-Bhalla, 2017年07月12日 20:07 | |
| PR 12099 | merged | vstinner, 2019年02月28日 17:34 | |
| PR 12108 | closed | miss-islington, 2019年02月28日 23:08 | |
| PR 12178 | merged | vstinner, 2019年03月05日 12:44 | |
| Messages (37) | |||
|---|---|---|---|
| msg287867 - (view) | Author: Jaysinh shukla (jaysinh.shukla) * | Date: 2017年02月15日 17:29 | |
Description: A test case is failing while running `./python -m test -v test_re`. Traceback: $>./python -m test -v test_re == CPython 3.7.0a0 (default, Feb 15 2017, 22:28:32) [GCC 5.4.0 20160609] == Linux-4.4.0-62-generic-x86_64-with-debian-stretch-sid little-endian == hash algorithm: siphash24 64bit == cwd: /home/bigj/Jaysinh/cpython_git/cpython/build/test_python_613 == encodings: locale=UTF-8, FS=utf-8 Testing with flags: sys.flags(debug=0, inspect=0, interactive=0, optimize=0, dont_write_bytecode=0, no_user_site=0, no_site=0, ignore_environment=0, verbose=0, bytes_warning=0, quiet=0, hash_randomization=1, isolated=0) Run tests sequentially 0:00:00 [1/1] test_re test_re_benchmarks (test.test_re.ExternalTests) re_tests benchmarks ... ok test_re_tests (test.test_re.ExternalTests) re_tests test suite ... ok test_overlap_table (test.test_re.ImplementationTest) ... ok test_bytes (test.test_re.PatternReprTests) ... ok test_inline_flags (test.test_re.PatternReprTests) ... ok test_locale (test.test_re.PatternReprTests) ... ok test_long_pattern (test.test_re.PatternReprTests) ... ok test_multiple_flags (test.test_re.PatternReprTests) ... ok test_quotes (test.test_re.PatternReprTests) ... ok test_single_flag (test.test_re.PatternReprTests) ... ok test_unicode_flag (test.test_re.PatternReprTests) ... ok test_unknown_flags (test.test_re.PatternReprTests) ... ok test_without_flags (test.test_re.PatternReprTests) ... ok test_anyall (test.test_re.ReTests) ... ok test_ascii_and_unicode_flag (test.test_re.ReTests) ... ok test_backref_group_name_in_exception (test.test_re.ReTests) ... ok test_basic_re_sub (test.test_re.ReTests) ... ok test_big_codesize (test.test_re.ReTests) ... ok test_bigcharset (test.test_re.ReTests) ... ok test_bug_113254 (test.test_re.ReTests) ... ok test_bug_114660 (test.test_re.ReTests) ... ok test_bug_117612 (test.test_re.ReTests) ... ok test_bug_1661 (test.test_re.ReTests) ... ok test_bug_16688 (test.test_re.ReTests) ... ok test_bug_20998 (test.test_re.ReTests) ... ok test_bug_2537 (test.test_re.ReTests) ... ok test_bug_29444 (test.test_re.ReTests) ... ok test_bug_3629 (test.test_re.ReTests) ... ok test_bug_418626 (test.test_re.ReTests) ... ok test_bug_448951 (test.test_re.ReTests) ... ok test_bug_449000 (test.test_re.ReTests) ... ok test_bug_449964 (test.test_re.ReTests) ... ok test_bug_462270 (test.test_re.ReTests) ... ok test_bug_527371 (test.test_re.ReTests) ... ok test_bug_581080 (test.test_re.ReTests) ... ok test_bug_612074 (test.test_re.ReTests) ... ok test_bug_6509 (test.test_re.ReTests) ... ok test_bug_6561 (test.test_re.ReTests) ... ok test_bug_725106 (test.test_re.ReTests) ... ok test_bug_725149 (test.test_re.ReTests) ... ok test_bug_764548 (test.test_re.ReTests) ... ok test_bug_817234 (test.test_re.ReTests) ... ok test_bug_926075 (test.test_re.ReTests) ... ok test_bug_931848 (test.test_re.ReTests) ... ok test_bytes_str_mixing (test.test_re.ReTests) ... ok test_category (test.test_re.ReTests) ... ok test_character_set_errors (test.test_re.ReTests) ... ok test_compile (test.test_re.ReTests) ... ok test_constants (test.test_re.ReTests) ... ok test_dealloc (test.test_re.ReTests) ... ok test_debug_flag (test.test_re.ReTests) ... ok test_dollar_matches_twice (test.test_re.ReTests) $ matches the end of string, and just before the terminating ... ok test_empty_array (test.test_re.ReTests) ... ok test_enum (test.test_re.ReTests) ... ok test_error (test.test_re.ReTests) ... ok test_expand (test.test_re.ReTests) ... ok test_finditer (test.test_re.ReTests) ... ok test_flags (test.test_re.ReTests) ... ok test_getattr (test.test_re.ReTests) ... ok test_getlower (test.test_re.ReTests) ... ok test_group (test.test_re.ReTests) ... ok test_group_name_in_exception (test.test_re.ReTests) ... ok test_groupdict (test.test_re.ReTests) ... ok test_ignore_case (test.test_re.ReTests) ... ok test_ignore_case_range (test.test_re.ReTests) ... ok test_ignore_case_set (test.test_re.ReTests) ... ok test_inline_flags (test.test_re.ReTests) ... ok test_issue17998 (test.test_re.ReTests) ... ok test_keep_buffer (test.test_re.ReTests) ... ok test_keyword_parameters (test.test_re.ReTests) ... ok test_large_search (test.test_re.ReTests) ... ok test_large_subn (test.test_re.ReTests) ... ok test_locale_caching (test.test_re.ReTests) ... skipped 'test needs en_US.iso88591 locale' test_locale_flag (test.test_re.ReTests) ... FAIL test_lookahead (test.test_re.ReTests) ... ok test_lookbehind (test.test_re.ReTests) ... ok test_match_getitem (test.test_re.ReTests) ... ok test_match_repr (test.test_re.ReTests) ... ok test_misc_errors (test.test_re.ReTests) ... ok test_multiple_repeat (test.test_re.ReTests) ... ok test_not_literal (test.test_re.ReTests) ... ok test_nothing_to_repeat (test.test_re.ReTests) ... ok test_other_escapes (test.test_re.ReTests) ... ok test_pattern_compare (test.test_re.ReTests) ... ok test_pattern_compare_bytes (test.test_re.ReTests) ... ok test_pickling (test.test_re.ReTests) ... ok test_qualified_re_split (test.test_re.ReTests) ... ok test_qualified_re_sub (test.test_re.ReTests) ... ok test_re_escape (test.test_re.ReTests) ... ok test_re_escape_byte (test.test_re.ReTests) ... ok test_re_escape_non_ascii (test.test_re.ReTests) ... ok test_re_escape_non_ascii_bytes (test.test_re.ReTests) ... ok test_re_findall (test.test_re.ReTests) ... ok test_re_fullmatch (test.test_re.ReTests) ... ok test_re_groupref (test.test_re.ReTests) ... ok test_re_groupref_exists (test.test_re.ReTests) ... ok test_re_groupref_overflow (test.test_re.ReTests) ... ok test_re_match (test.test_re.ReTests) ... ok test_re_split (test.test_re.ReTests) ... ok test_re_subn (test.test_re.ReTests) ... ok test_repeat_minmax (test.test_re.ReTests) ... ok test_repeat_minmax_overflow (test.test_re.ReTests) ... ok test_repeat_minmax_overflow_maxrepeat (test.test_re.ReTests) ... ok test_scanner (test.test_re.ReTests) ... ok test_scoped_flags (test.test_re.ReTests) ... ok test_search_coverage (test.test_re.ReTests) ... ok test_search_dot_unicode (test.test_re.ReTests) ... ok test_search_star_plus (test.test_re.ReTests) ... ok test_special_escapes (test.test_re.ReTests) ... ok test_sre_byte_class_literals (test.test_re.ReTests) ... ok test_sre_byte_literals (test.test_re.ReTests) ... ok test_sre_character_class_literals (test.test_re.ReTests) ... ok test_sre_character_literals (test.test_re.ReTests) ... ok test_stack_overflow (test.test_re.ReTests) ... ok test_string_boundaries (test.test_re.ReTests) ... ok test_sub_template_numeric_escape (test.test_re.ReTests) ... ok test_symbolic_groups (test.test_re.ReTests) ... ok test_symbolic_refs (test.test_re.ReTests) ... ok test_unlimited_zero_width_repeat (test.test_re.ReTests) ... ok test_weakref (test.test_re.ReTests) ... ok ====================================================================== FAIL: test_locale_flag (test.test_re.ReTests) ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/bigj/Jaysinh/cpython_git/cpython/Lib/test/test_re.py", line 1422, in test_locale_flag self.assertTrue(pat.match(bletter)) AssertionError: None is not true ---------------------------------------------------------------------- Ran 120 tests in 2.079s FAILED (failures=1, skipped=1) test test_re failed test_re failed 1 test failed: test_re Total duration: 2 sec Tests result: FAILURE Local value: $>locale LANG=en_IN LANGUAGE=en_IN:en LC_CTYPE="en_IN" LC_NUMERIC="en_IN" LC_TIME="en_IN" LC_COLLATE="en_IN" LC_MONETARY="en_IN" LC_MESSAGES="en_IN" LC_PAPER="en_IN" LC_NAME="en_IN" LC_ADDRESS="en_IN" LC_TELEPHONE="en_IN" LC_MEASUREMENT="en_IN" LC_IDENTIFICATION="en_IN" LC_ALL= Operating system: Ubuntu 16.04 LTS(64 bit) |
|||
| msg287879 - (view) | Author: Matthew Barnett (mrabarnett) * (Python triager) | Date: 2017年02月15日 18:56 | |
I'm just wondering whether the problem is just due to the locale's encoding being UTF-8. The locale support in re really only works with encodings that use 1 byte/character. |
|||
| msg287880 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2017年02月15日 19:03 | |
Locale encoding is ISO8859-1. This test is skipped on non 8-bit locale. This is a problem with tests, not with the re module. I don't have a solution. |
|||
| msg287882 - (view) | Author: Matthew Barnett (mrabarnett) * (Python triager) | Date: 2017年02月15日 19:25 | |
The report says "== encodings: locale=UTF-8, FS=utf-8". It says that "test_locale_caching" was skipped, but also that "test_locale_flag" failed. |
|||
| msg287893 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2017年02月15日 22:57 | |
Good point. The test used locale.getlocale() and it returned returned ('en_IN', 'ISO8859-1').
Following patch makes the test using locale.getpreferredencoding(False), the same encoding as was reported at the header of test report.
|
|||
| msg287894 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2017年02月15日 22:58 | |
> Following patch ... Seriously? Not a GitHub pull request? ;-) (old habit?) |
|||
| msg287933 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2017年02月16日 12:05 | |
> Seriously? Not a GitHub pull request? ;-) (old habit?) I'm not experienced with git, and devguide still looks not ready. |
|||
| msg288056 - (view) | Author: Alyssa Coghlan (ncoghlan) * (Python committer) | Date: 2017年02月18日 05:04 | |
I have a few folks hitting this at the PyCon Pune sprints, so I'm going to apply Serhiy's patch :) |
|||
| msg288065 - (view) | Author: Alyssa Coghlan (ncoghlan) * (Python committer) | Date: 2017年02月18日 08:54 | |
Looking into this at the PyCon Pune sprints, the problem appears to be arising due to the following difference in behaviour when the unqualifed `en_IN` locale is set:
$ LANG=en_IN.UTF-8 python3 -c "import locale; print(locale.getlocale(locale.LC_CTYPE), locale.getpreferredencoding(False), sep='\n')"
('en_IN', 'UTF-8')
UTF-8
$ LANG=en_IN python3 -c "import locale; print(locale.getlocale(locale.LC_CTYPE), locale.getpreferredencoding(False), sep='\n')"
('en_IN', 'ISO8859-1')
UTF-8
re.LOCALE is presumably picking up the "UTF-8" rather than the "ISO8859-1", and hence the test is failing.
|
|||
| msg288068 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2017年02月18日 09:25 | |
Yes, please push it Nick. |
|||
| msg288069 - (view) | Author: Alyssa Coghlan (ncoghlan) * (Python committer) | Date: 2017年02月18日 09:31 | |
New changeset ace5c0fdd9b962e6e886c29dbcea72c53f051dc4 by GitHub in branch 'master': bpo-29571: Use correct locale encoding in test_re (#149) https://github.com/python/cpython/commit/ace5c0fdd9b962e6e886c29dbcea72c53f051dc4 |
|||
| msg288101 - (view) | Author: Alyssa Coghlan (ncoghlan) * (Python committer) | Date: 2017年02月19日 04:33 | |
New changeset 0683d6889bd4430599d22e12e201b8e9c45be5a2 by GitHub in branch '3.6': [3.6] bpo-29571: Use correct locale encoding in test_re (#149) (#153) https://github.com/python/cpython/commit/0683d6889bd4430599d22e12e201b8e9c45be5a2 |
|||
| msg288102 - (view) | Author: Alyssa Coghlan (ncoghlan) * (Python committer) | Date: 2017年02月19日 04:33 | |
New changeset 760f596b6a4b5514afe35e521621f484aef35413 by GitHub in branch '3.5': [3.5] bpo-29571: Use correct locale encoding in test_re (#149) (#154) https://github.com/python/cpython/commit/760f596b6a4b5514afe35e521621f484aef35413 |
|||
| msg289054 - (view) | Author: Zachary Ware (zach.ware) * (Python committer) | Date: 2017年03月06日 01:34 | |
This seems to have broken test_re on Windows, see https://ci.appveyor.com/project/python/cpython/build/3.7.0a0.1 I found this change to be the culprit via git bisect, unfortunately we didn't have any working CI on Windows (buildbots were otherwise broken) at the time this was merged. |
|||
| msg289071 - (view) | Author: Benjamin Peterson (benjamin.peterson) * (Python committer) | Date: 2017年03月06日 07:52 | |
Yep, I think we should merge https://github.com/python/cpython/pull/422 and revert ncoghlan's change. |
|||
| msg289072 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2017年03月06日 07:54 | |
I'm not sure this will help on Windows. |
|||
| msg289073 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2017年03月06日 07:55 | |
And I don't understand why my fix doesn't work on Windows. |
|||
| msg289074 - (view) | Author: Benjamin Peterson (benjamin.peterson) * (Python committer) | Date: 2017年03月06日 07:55 | |
But the test was never broken on windows. On Sun, Mar 5, 2017, at 23:54, Serhiy Storchaka wrote: > > Serhiy Storchaka added the comment: > > I'm not sure this will help on Windows. > > ---------- > > _______________________________________ > Python tracker <report@bugs.python.org> > <http://bugs.python.org/issue29571> > _______________________________________ |
|||
| msg289075 - (view) | Author: Benjamin Peterson (benjamin.peterson) * (Python committer) | Date: 2017年03月06日 07:55 | |
getpreferredencoding() takes a completely different path on windows (returns a codepage) and isn't related to the C locale. |
|||
| msg289076 - (view) | Author: Alyssa Coghlan (ncoghlan) * (Python committer) | Date: 2017年03月06日 07:58 | |
I'm with Serhiy on this one: if the "re" module isn't using locale.getpreferredencoding(), then there's something odd going on. It just sounds like the disconnect on Windows is the opposite of the one we hit on Linux without Benjamin's patch, perhaps due to the UTF-8 mode changes - it wouldn't surprise me to learn that the re module is still using mbcs there instead of utf-8. |
|||
| msg289077 - (view) | Author: Benjamin Peterson (benjamin.peterson) * (Python committer) | Date: 2017年03月06日 08:01 | |
I don't see what's odd about it. re.LOCALE uses the C locale, which one obtains from locale.getlocale(). getpreferredencoding() is not documented to have anything to do with the C locale, and indeed on Windows it may be completely different. |
|||
| msg289118 - (view) | Author: Alyssa Coghlan (ncoghlan) * (Python committer) | Date: 2017年03月06日 15:52 | |
Thanks for the explanation - given that, I agree that simply reverting the attempted test-based fix and instead relying on the issue 20087 updates is the way to go. |
|||
| msg290268 - (view) | Author: Benjamin Peterson (benjamin.peterson) * (Python committer) | Date: 2017年03月24日 22:42 | |
New changeset 6a4b04cd337347d074ae0140fb13dca5bd4b11ef by Benjamin Peterson in branch '3.6': Revert "bpo-29571: Use correct locale encoding in test_re (#149)" (#554) (#555) https://github.com/python/cpython/commit/6a4b04cd337347d074ae0140fb13dca5bd4b11ef |
|||
| msg290269 - (view) | Author: Benjamin Peterson (benjamin.peterson) * (Python committer) | Date: 2017年03月24日 22:42 | |
New changeset 312f7dfb7c669fcfc43020951b7f8ff521200ad7 by Benjamin Peterson in branch '3.5': Revert "bpo-29571: Use correct locale encoding in test_re (#149)" (#554) (#556) https://github.com/python/cpython/commit/312f7dfb7c669fcfc43020951b7f8ff521200ad7 |
|||
| msg290272 - (view) | Author: Benjamin Peterson (benjamin.peterson) * (Python committer) | Date: 2017年03月24日 22:42 | |
New changeset 21a74312f2d1ddee71fade709af49d078085ec30 by Benjamin Peterson in branch 'master': Revert "bpo-29571: Use correct locale encoding in test_re (#149)" (#554) https://github.com/python/cpython/commit/21a74312f2d1ddee71fade709af49d078085ec30 |
|||
| msg310947 - (view) | Author: Alyssa Coghlan (ncoghlan) * (Python committer) | Date: 2018年01月28日 14:01 | |
Hmm, even though we reverted the original test_re based change, and the initial attempted fix for bpo-20087 was also reverted, I'm still not currently seeing the failure for: LANG=en_IN.utf8 ./python -m test -v test_re I do have the locale installed, so it's not a result of falling back to the C locale and that getting coerced to C.UTF-8: $ LANG=en_IN.utf8 locale -k currency_symbol currency_symbol="₹" Jaysinh, are you still seeing this test failure on a fresh checkout? |
|||
| msg310948 - (view) | Author: Alyssa Coghlan (ncoghlan) * (Python committer) | Date: 2018年01月28日 14:15 | |
Hmm, this actually works for me on Fedora 27 even if I go back to 1b3d88eb33085e90af729c4c2f78b5ba1b942b1e, the commit just before the initially merged (and subsequently reverted) test change above. Unassigning, since I can't readily reproduce it myself. |
|||
| msg311074 - (view) | Author: Jaysinh shukla (jaysinh.shukla) * | Date: 2018年01月29日 07:00 | |
Hello Nick, At the devsprints of Pycon India 2017, a few participants were facing this bug. They all were from the Ubuntu land. I have switched to Gentoo distro. I am not facing this bug, but let me confirm from any Ubuntu user. Thanks |
|||
| msg311076 - (view) | Author: Alyssa Coghlan (ncoghlan) * (Python committer) | Date: 2018年01月29日 07:21 | |
I've also added Matthias and Barry to the cc list, in case this does turn out to be a Debian or Ubuntu specific quirk. Restating the problem, the issue is that test_locale_flag in test_re may fail for at least the en_IN locale, and we're not sure yet whether that's a test bug, a locale module bug, or a distro bug: LANG=en_IN ./python -m test -v test_re We've only confirmed it on Ubuntu so far though - I haven't been able to reproduce it on Fedora, and Jaysinh hasn't been able to reproduce it since switching to Gentoo. |
|||
| msg336826 - (view) | Author: Karthikeyan Singaravelan (xtreak) * (Python committer) | Date: 2019年02月28日 11:23 | |
Similar issue reported on debian9.8 stretch with python 3.7.2 and en_IN : issue36134 |
|||
| msg336855 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2019年02月28日 17:34 | |
Ah, I can reproduce the bug on Fedora 29 using "LANG=en_IN ./python -m test -v test_re".
The problem is that locale.getlocale() is not reliable: it pretends that the locale encoding is ISO8859-1, whereas the real encoding is UTF-8:
$ LANG=en_IN ./python
Python 3.8.0a2+ (heads/master:4cbea518a0, Feb 28 2019, 18:19:44)
>>> chr(224).encode('ISO8859-1')
b'\xe0'
>>> import _testcapi
>>> _testcapi.DecodeLocaleEx(b'\xe0', 0, 'strict')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
RuntimeError: decode error: pos=0, reason=decoding error
>>> import locale
# Wrong encoding
>>> locale.getlocale(locale.LC_CTYPE)
('en_IN', 'ISO8859-1')
>>> locale.setlocale(locale.LC_CTYPE, None)
'en_IN'
>>> locale._parse_localename('en_IN')
('en_IN', 'ISO8859-1')
# Real encoding
>>> locale.getpreferredencoding()
'UTF-8'
>>> locale.nl_langinfo(locale.CODESET)
'UTF-8'
Attached PR 12099 fix the issue.
|
|||
| msg336877 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2019年02月28日 23:05 | |
> This seems to have broken test_re on Windows, see https://ci.appveyor.com/project/python/cpython/build/3.7.0a0.1 It seems like the ANSI code page is 1252 ("cp1252"). == CPython 3.7.0a0 (master:d31b28e16a2387d0251df948ef5d1b33d4357652, Mar 5 2017, 21:47:06) [MSC v.1900 32 bit (Intel)] == Windows-2012ServerR2-6.3.9600-SP0 little-endian == hash algorithm: siphash24 32bit == cwd: C:\projects\cpython\build\test_python_1844 == encodings: locale=cp1252, FS=utf-8 Testing with flags: sys.flags(debug=0, inspect=0, interactive=0, optimize=0, dont_write_bytecode=0, no_user_site=0, no_site=0, ignore_environment=1, verbose=0, bytes_warning=2, quiet=0, hash_randomization=1, isolated=0) Using random seed 5949816 ... FAIL: test_locale_flag (test.test_re.ReTests) ---------------------------------------------------------------------- Traceback (most recent call last): File "C:\projects\cpython\lib\test\test_re.py", line 1422, in test_locale_flag self.assertTrue(pat.match(bletter)) AssertionError: None is not true > getpreferredencoding() takes a completely different path on windows > (returns a codepage) and isn't related to the C locale. On my Windows 10 with Python 3.8, getpreferredencoding() (and getpreferredencoding(False)) returns "cp1252", getlocale(LC_CTYPE)[1] returns "1252". Python has an alias "1252" for "cp1252". On Windows, getpreferredencoding() is implemented as _locale._getdefaultlocale()[1]. _getdefaultlocale()[1] is implemented with: PyOS_snprintf(encoding, sizeof(encoding), "cp%d", GetACP()); At the end, it's the ANSI code page (1252). -- I don't understand how the change ace5c0fdd9b962e6e886c29dbcea72c53f051dc4 introduced a regression. And so I don't understand how commit 21a74312f2d1ddee71fade709af49d078085ec30 (revert) could fix anything. -- On my PR 12099, two Windows CI run and both succeeded: * AppVeyor: pythoninfo says "locale.encoding: cp1252" https://ci.appveyor.com/project/python/cpython/builds/22726025 * Windows PR Tests on Azure Pipeline: pythoninfo also says "locale.encoding: cp1252" When the change ace5c0fdd9b962e6e886c29dbcea72c53f051dc4 was merged, Python had no working Windows CI. Things evolved at lot in the meanwhile. I also tested manually my PR 12099 on my Windows 10 VM which also uses cp1252: test_re pass. -- re.LOCALE flag of re.compile() for a bytes pattern uses the following function of Modules/_sre.c: LOCAL(int) char_loc_ignore(SRE_CODE pattern, SRE_CODE ch) { return ch == pattern || (SRE_CODE) sre_lower_locale(ch) == pattern || (SRE_CODE) sre_upper_locale(ch) == pattern; } |
|||
| msg336878 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2019年02月28日 23:08 | |
New changeset ab71f8b793f7b42853ccd2a127ae7720adc5bcb4 by Victor Stinner in branch 'master': bpo-29571: Fix test_re.test_locale_flag() (GH-12099) https://github.com/python/cpython/commit/ab71f8b793f7b42853ccd2a127ae7720adc5bcb4 |
|||
| msg336883 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2019年03月01日 01:13 | |
AppVeyor failed on the backport to Python 3.7 of my fix: PR 12108. Ok, now I understand the bug in Python 3.7. locale.getlocale(locale.LC_CTYPE)[1] returns None because Python doesn't set LC_CTYPE to the user preferred locale. I'm not sure of which locale is used in practice in that case, but at least I can say that None is not the expected encoding name... str.encode() and bytes.decode() use UTF-8 when None is passed as the encoding. locale.getpreferredencoding() returns 'cp1252' which is the ANSI code page. Python 3.8 is different. In bpo-34485, I modified Python 3.8 to set LC_CTYPE locale to the user preference (ANSI code page): --- commit 177d921c8c03d30daa32994362023f777624b10d Author: Victor Stinner <vstinner@redhat.com> Date: Wed Aug 29 11:25:15 2018 +0200 bpo-34485, Windows: LC_CTYPE set to user preference (GH-8988) On Windows, the LC_CTYPE is now set to the user preferred locale at startup: _Py_SetLocaleFromEnv(LC_CTYPE) is now called during the Python initialization. Previously, the LC_CTYPE locale was "C" at startup, but changed when calling setlocale(LC_CTYPE, "") or setlocale(LC_ALL, ""). pymain_read_conf() now also calls _Py_SetLocaleFromEnv(LC_CTYPE) to behave as _Py_InitializeCore(). Moreover, it doesn't save/restore the LC_ALL anymore. On Windows, standard streams like sys.stdout now always use surrogateescape error handler by default (ignore the locale). --- |
|||
| msg337185 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2019年03月05日 12:34 | |
I wrote C and Python code to check what is the effective encoding used by the LC_CTYPE locale before setlocale(LC_CTYPE, "") is called on Python 3.7. Result: Windows uses the Latin1 encoding. See attached files: _testcapi.patch + loc.py produced loc.log (output). |
|||
| msg337200 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2019年03月05日 15:17 | |
New changeset 279657bac2856039ba422c18a3d7f227b455e9d6 by Victor Stinner in branch '3.7': [3.7] bpo-29571: Fix test_re.test_locale_flag() (GH-12178) https://github.com/python/cpython/commit/279657bac2856039ba422c18a3d7f227b455e9d6 |
|||
| msg337204 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2019年03月05日 15:26 | |
I don't understand the relationship with bpo-20087, so I removed the dependency. I fixed test_re in 3.7 and master branches. I close the issue. |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:58:43 | admin | set | github: 73757 |
| 2019年10月24日 10:46:11 | vstinner | link | issue32080 superseder |
| 2019年03月05日 15:26:32 | vstinner | set | status: open -> closed versions: + Python 3.8, - Python 3.5, Python 3.6 messages: + msg337204 dependencies: - Mismatch between glibc and X11 locale.alias resolution: fixed stage: patch review -> resolved |
| 2019年03月05日 15:17:45 | vstinner | set | messages: + msg337200 |
| 2019年03月05日 12:44:52 | vstinner | set | pull_requests: + pull_request12173 |
| 2019年03月05日 12:34:33 | vstinner | set | files: + _testcapi.patch |
| 2019年03月05日 12:34:26 | vstinner | set | files: + loc.log |
| 2019年03月05日 12:34:20 | vstinner | set | files:
+ loc.py messages: + msg337185 |
| 2019年03月01日 01:13:06 | vstinner | set | messages: + msg336883 |
| 2019年02月28日 23:08:30 | miss-islington | set | pull_requests: + pull_request12115 |
| 2019年02月28日 23:08:09 | vstinner | set | messages: + msg336878 |
| 2019年02月28日 23:05:44 | vstinner | set | messages: + msg336877 |
| 2019年02月28日 17:34:31 | vstinner | set | messages: + msg336855 |
| 2019年02月28日 17:34:12 | vstinner | set | stage: patch review pull_requests: + pull_request12106 |
| 2019年02月28日 12:55:55 | serhiy.storchaka | link | issue36134 superseder |
| 2019年02月28日 11:23:44 | xtreak | set | nosy:
+ xtreak messages: + msg336826 |
| 2018年01月29日 07:21:06 | ncoghlan | set | nosy:
+ barry, doko messages: + msg311076 |
| 2018年01月29日 07:00:04 | jaysinh.shukla | set | messages: + msg311074 |
| 2018年01月28日 14:15:37 | ncoghlan | set | assignee: ncoghlan -> messages: + msg310948 |
| 2018年01月28日 14:01:10 | ncoghlan | set | messages: + msg310947 |
| 2017年07月12日 21:03:59 | Naman-Bhalla | set | nosy:
+ Naman-Bhalla |
| 2017年07月12日 20:07:31 | Naman-Bhalla | set | pull_requests: + pull_request2751 |
| 2017年04月01日 05:49:51 | serhiy.storchaka | set | pull_requests: - pull_request1096 |
| 2017年03月31日 16:36:37 | dstufft | set | pull_requests: + pull_request1096 |
| 2017年03月24日 22:42:25 | benjamin.peterson | set | messages: + msg290272 |
| 2017年03月24日 22:42:10 | benjamin.peterson | set | messages: + msg290269 |
| 2017年03月24日 22:42:04 | benjamin.peterson | set | messages: + msg290268 |
| 2017年03月08日 06:51:44 | benjamin.peterson | set | pull_requests: + pull_request457 |
| 2017年03月08日 06:51:41 | benjamin.peterson | set | pull_requests: + pull_request456 |
| 2017年03月08日 06:07:19 | benjamin.peterson | set | pull_requests: + pull_request455 |
| 2017年03月06日 15:52:28 | ncoghlan | set | messages: + msg289118 |
| 2017年03月06日 09:41:12 | serhiy.storchaka | set | dependencies: + Mismatch between glibc and X11 locale.alias |
| 2017年03月06日 08:01:39 | benjamin.peterson | set | messages: + msg289077 |
| 2017年03月06日 07:58:37 | ncoghlan | set | messages: + msg289076 |
| 2017年03月06日 07:55:55 | benjamin.peterson | set | messages: + msg289075 |
| 2017年03月06日 07:55:21 | benjamin.peterson | set | messages: + msg289074 |
| 2017年03月06日 07:55:14 | serhiy.storchaka | set | messages: + msg289073 |
| 2017年03月06日 07:54:36 | serhiy.storchaka | set | messages: + msg289072 |
| 2017年03月06日 07:52:43 | benjamin.peterson | set | nosy:
+ benjamin.peterson messages: + msg289071 |
| 2017年03月06日 06:09:41 | serhiy.storchaka | set | status: closed -> open nosy: + steve.dower, paul.moore, tim.golden components: + Windows resolution: fixed -> (no value) stage: resolved -> (no value) |
| 2017年03月06日 01:34:31 | zach.ware | set | nosy:
+ zach.ware messages: + msg289054 |
| 2017年03月03日 07:51:48 | benjamin.peterson | set | pull_requests: + pull_request352 |
| 2017年02月19日 04:35:03 | ncoghlan | set | status: open -> closed resolution: fixed stage: patch review -> resolved |
| 2017年02月19日 04:33:52 | ncoghlan | set | messages: + msg288102 |
| 2017年02月19日 04:33:37 | ncoghlan | set | messages: + msg288101 |
| 2017年02月18日 10:45:14 | ncoghlan | set | pull_requests: + pull_request118 |
| 2017年02月18日 10:44:50 | ncoghlan | set | pull_requests: + pull_request117 |
| 2017年02月18日 09:31:25 | ncoghlan | set | messages: + msg288069 |
| 2017年02月18日 09:25:58 | serhiy.storchaka | set | messages: + msg288068 |
| 2017年02月18日 09:19:08 | ncoghlan | set | pull_requests: + pull_request112 |
| 2017年02月18日 08:54:00 | ncoghlan | set | messages: + msg288065 |
| 2017年02月18日 05:04:15 | ncoghlan | set | assignee: serhiy.storchaka -> ncoghlan messages: + msg288056 nosy: + ncoghlan |
| 2017年02月16日 12:05:04 | serhiy.storchaka | set | messages: + msg287933 |
| 2017年02月15日 22:58:46 | vstinner | set | messages: + msg287894 |
| 2017年02月15日 22:57:27 | serhiy.storchaka | set | files:
+ test_re_locale_flag.patch keywords: + patch messages: + msg287893 stage: patch review |
| 2017年02月15日 19:25:47 | mrabarnett | set | messages: + msg287882 |
| 2017年02月15日 19:03:27 | serhiy.storchaka | set | messages:
+ msg287880 components: + Tests versions: + Python 3.5, Python 3.6 |
| 2017年02月15日 18:56:16 | mrabarnett | set | messages: + msg287879 |
| 2017年02月15日 18:02:04 | serhiy.storchaka | set | assignee: serhiy.storchaka type: behavior nosy: + serhiy.storchaka |
| 2017年02月15日 17:41:28 | vstinner | set | nosy:
+ vstinner |
| 2017年02月15日 17:29:11 | jaysinh.shukla | create | |