This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2020年12月16日 12:44 by sogom, last changed 2022年04月11日 14:59 by admin.
| Pull Requests | |||
|---|---|---|---|
| URL | Status | Linked | Edit |
| PR 32010 | open | asaka, 2022年03月20日 15:47 | |
| Messages (2) | |||
|---|---|---|---|
| msg383163 - (view) | Author: (sogom) | Date: 2020年12月16日 12:44 | |
On Windows file system, U+03A9 (Greek capital letter Omega) and U+2126 (Ohm sign) are distinguished. In fact, two distinct files "\u03A9.txt" and "\u2126.txt" can exist side by side in the same folder. But os.path.normcase() transforms both U+03A9 and U+2126 to U+03C9 (Greek small letter omega). MSDN reads they use CompareStringOrdinal() to compare NTFS file names: https://docs.microsoft.com/en-us/windows/win32/intl/handling-sorting-in-your-applications#sort-strings-ordinally . This document also says "the function maps case using the operating system *uppercasing* table." But I made an experiment and found that at least in the Basic Multilingual Plane, "lowercase two strings by means of LCMapStringEx() and then wcscmp the two" always gives the same result as "compare the two strings with CompareStringOrdinal()". Though this fact is not explicitly mentioned in MSDN https://docs.microsoft.com/en-us/windows/win32/api/winnls/nf-winnls-lcmapstringex , the description of LCMAP_LINGUISTIC_CASING in this page implies that casing rules conform to file system's unless LCMAP_LINGUISTIC_CASING is used. Therefore, I believe that os.path.normcase() should probably call LCMapStringEx(), with the first argument LOCALE_NAME_INVARIANT and the second argument LCMAP_LOWERCASE. |
|||
| msg384012 - (view) | Author: Eryk Sun (eryksun) * (Python triager) | Date: 2020年12月29日 15:48 | |
> "lowercase two strings by means of LCMapStringEx() and then wcscmp > the two" always gives the same result as "compare the two strings > with CompareStringOrdinal()" For checking case-insensitive equality, it shouldn't matter whether names are converted to uppercase or lowercase when using invariant non-linguistic casing. It's based on symmetric mappings between pairs of uppercase and lowercase codes, which avoids problems such as 'Θ' (U+03F4) and 'Θ' (U+0398) both lowercasing as 'θ' (U+03B8), or 'ß' uppercasing as 'SS'. That said, when sorting filenames, you need to use LCMAP_UPPERCASE in order to match the case-insensitive sort order of Windows. For example, 'Ÿ' (U+0178) is greater than 'Ŷ' (U+0176), but -- respectively lowercase -- 'ÿ' (U+00FF) is less than 'ŷ' (U+0177). In particular, if you have an NTFS directory with two files named 'ÿ' and 'ŷ', the listing will be ['ŷ', 'ÿ'] -- in uppercase order. (An NTFS directory is stored on disk as a b-tree sorted by uppercase filenames.) For the implementation, _winapi.LCMapStringEx and related constants could be added. |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:59:39 | admin | set | github: 86824 |
| 2022年03月20日 15:47:09 | asaka | set | keywords:
+ patch nosy: + asaka pull_requests: + pull_request30098 stage: patch review |
| 2021年03月09日 20:27:48 | vstinner | set | nosy:
- vstinner |
| 2021年03月09日 15:06:51 | eryksun | link | issue43397 superseder |
| 2021年03月09日 15:02:02 | eryksun | set | nosy:
+ ezio.melotti, vstinner components: + Library (Lib), Unicode versions: + Python 3.8, Python 3.10 |
| 2020年12月29日 15:48:41 | eryksun | set | nosy:
+ eryksun messages: + msg384012 |
| 2020年12月16日 12:44:26 | sogom | create | |