This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2012年11月05日 12:12 by vstinner, last changed 2022年04月11日 14:57 by admin. This issue is now closed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| support_non_ascii-2.patch | vstinner, 2012年11月05日 23:10 | review | ||
| brute.py | vstinner, 2012年11月05日 23:12 | |||
| brute2.py | vstinner, 2012年11月06日 22:41 | |||
| Messages (19) | |||
|---|---|---|---|
| msg174897 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2012年11月05日 12:12 | |
Attached patch adds support.NONASCII to have a "portable" non-ASCII character that can be used to test non-ASCII strings. The patch uses it in some existing functions. I wrote the patch on the default branch, we may start to use it since Python 3.2. |
|||
| msg174900 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2012年11月05日 12:26 | |
I think you should ensure that os.fsdecode(os.fsencode(character)) == character. |
|||
| msg174904 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2012年11月05日 12:58 | |
If NONASCII is None I suggest the followed fallback code for i in range(0x80, 0xFFFF): character = chr(i) if character.isprintable(): try: if os.fsdecode(os.fsencode(character)) == character: NONASCII = character break except UnicodeError: pass |
|||
| msg174922 - (view) | Author: Chris Jerdonek (chris.jerdonek) * (Python committer) | Date: 2012年11月05日 17:23 | |
+# NONASCII: non-ASCII character encodable by os.fsencode(), +# or None if there is no such character. +NONASCII = None Can you use a name that reflects that this is a specific type of non-ASCII character having a special property (e.g. FS_NONASCII)? I think "ASCII" should be reserved for a generic non-ASCII character. Moreover, there may be other types of non-ASCII we can add in the future. |
|||
| msg174946 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2012年11月05日 23:10 | |
> I think you should ensure that os.fsdecode(os.fsencode(character)) == character. Chosen characters respect this property, but it doesn't hurt to add such check. > Can you use a name that reflects that this is a specific type > of non-ASCII character having a special property (e.g. FS_NONASCII)? Done. |
|||
| msg174948 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2012年11月05日 23:12 | |
> If NONASCII is None I suggest the followed fallback code I prefer to not "brute force" Unicode because it would slow down any test, even tests not using FS_NONASCII. I wrote attached brute.py script to compute an exhaustive list of non-ASCII characters encodable to "any" locale encoding. My locale encoding list is not complete, but it should be enough for our purpose. The list can be completed later. |
|||
| msg174949 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2012年11月05日 23:17 | |
I tested support_non_ascii-2.patch on Windows with cp932 ANSI code page (FS encoding), and on Linux with ASCII, ISO-8859-1, ISO-8859-15 and UTF-8 locale encodings. |
|||
| msg174959 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2012年11月06日 09:48 | |
I tested brute.py for all supported in Python encodings: No character for encoding cp1006:surrogateescape :-( No character for encoding cp720:surrogateescape :-( No character for encoding cp864:surrogateescape :-( No character for encoding iso8859_3:surrogateescape :-( No character for encoding iso8859_6:surrogateescape :-( No character for encoding mac_arabic:surrogateescape :-( No character for encoding mac_farsi:surrogateescape :-( |
|||
| msg174961 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2012年11月06日 10:20 | |
> I tested brute.py for all supported in Python encodings: Oh thanks, interesting result. I completed the encoding list and the character list: see brute2.py. I added "joker" characters: U+00A0 and U+20AC which match requierements for most locale encodings. |
|||
| msg175016 - (view) | Author: Roundup Robot (python-dev) (Python triager) | Date: 2012年11月06日 22:23 | |
New changeset de8cf1ece068 by Victor Stinner in branch 'default': Issue #16414: Add support.FS_NONASCII and support.TESTFN_NONASCII http://hg.python.org/cpython/rev/de8cf1ece068 |
|||
| msg175017 - (view) | Author: Roundup Robot (python-dev) (Python triager) | Date: 2012年11月06日 22:33 | |
New changeset 0e9fbdda3c92 by Victor Stinner in branch 'default': Issue #16414: Fix support.TESTFN_UNDECODABLE and test_genericpath.test_nonascii_abspath() http://hg.python.org/cpython/rev/0e9fbdda3c92 |
|||
| msg175018 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2012年11月06日 22:34 | |
Why were you add '- ' suffix to TESTFN_NONASCII? |
|||
| msg175019 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2012年11月06日 22:39 | |
I don't see U+00A0 and U+20AC in the changeset. |
|||
| msg175020 - (view) | Author: Roundup Robot (python-dev) (Python triager) | Date: 2012年11月06日 22:40 | |
New changeset 55710b8c6670 by Victor Stinner in branch 'default': Issue #16414: Fix typo in support.TESTFN_NONASCII (useless space) http://hg.python.org/cpython/rev/55710b8c6670 |
|||
| msg175021 - (view) | Author: Roundup Robot (python-dev) (Python triager) | Date: 2012年11月06日 22:43 | |
New changeset 7f90305d9f23 by Victor Stinner in branch 'default': Issue #16414: Test more characters for support.FS_NONASCII http://hg.python.org/cpython/rev/7f90305d9f23 |
|||
| msg175025 - (view) | Author: Roundup Robot (python-dev) (Python triager) | Date: 2012年11月06日 23:10 | |
New changeset fce9e892c65d by Victor Stinner in branch 'default': Issue #16414: Fix test_os on Windows, don't test os.listdir() with undecodable http://hg.python.org/cpython/rev/fce9e892c65d |
|||
| msg175026 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2012年11月06日 23:12 | |
> Why were you add '- ' suffix to TESTFN_NONASCII? Oops, the space was a mistake. I add "-" just for the readability of the generated filename. > I don't see U+00A0 and U+20AC in the changeset. Oh, I forgot to update the patch with the latest results of "brute2.py". It is now fixed. Thanks for the review! |
|||
| msg175033 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2012年11月06日 23:40 | |
Handling non-ASCII paths is always a pain. I don't plan to backport support.FS_NONASCII to Python 3.3 right now, but I may backport it later. |
|||
| msg178870 - (view) | Author: Roundup Robot (python-dev) (Python triager) | Date: 2013年01月03日 00:59 | |
New changeset 41658a4fb3cc by Victor Stinner in branch '3.2': Issue #16218, #16414, #16444: Backport FS_NONASCII, TESTFN_UNDECODABLE, http://hg.python.org/cpython/rev/41658a4fb3cc New changeset 4d40c1ce8566 by Victor Stinner in branch '3.3': (Merge 3.2) Issue #16218, #16414, #16444: Backport FS_NONASCII, http://hg.python.org/cpython/rev/4d40c1ce8566 |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:57:38 | admin | set | github: 60618 |
| 2013年01月03日 01:07:33 | vstinner | set | versions: + Python 3.2, Python 3.3 |
| 2013年01月03日 00:59:48 | python-dev | set | messages: + msg178870 |
| 2012年11月06日 23:40:40 | vstinner | set | status: open -> closed resolution: fixed messages: + msg175033 versions: - Python 3.3 |
| 2012年11月06日 23:12:22 | vstinner | set | messages: + msg175026 |
| 2012年11月06日 23:10:13 | python-dev | set | messages: + msg175025 |
| 2012年11月06日 22:43:05 | python-dev | set | messages: + msg175021 |
| 2012年11月06日 22:41:18 | vstinner | set | files: + brute2.py |
| 2012年11月06日 22:40:15 | python-dev | set | messages: + msg175020 |
| 2012年11月06日 22:39:58 | serhiy.storchaka | set | messages: + msg175019 |
| 2012年11月06日 22:34:10 | serhiy.storchaka | set | messages: + msg175018 |
| 2012年11月06日 22:33:32 | python-dev | set | messages: + msg175017 |
| 2012年11月06日 22:23:28 | python-dev | set | nosy:
+ python-dev messages: + msg175016 |
| 2012年11月06日 10:20:30 | vstinner | set | messages: + msg174961 |
| 2012年11月06日 09:48:23 | serhiy.storchaka | set | messages: + msg174959 |
| 2012年11月05日 23:17:33 | vstinner | set | messages: + msg174949 |
| 2012年11月05日 23:12:37 | vstinner | set | files: - support_non_ascii.patch |
| 2012年11月05日 23:12:29 | vstinner | set | files:
+ brute.py messages: + msg174948 |
| 2012年11月05日 23:10:47 | vstinner | set | files:
+ support_non_ascii-2.patch messages: + msg174946 |
| 2012年11月05日 17:23:31 | chris.jerdonek | set | nosy:
+ chris.jerdonek messages: + msg174922 |
| 2012年11月05日 12:58:22 | serhiy.storchaka | set | messages: + msg174904 |
| 2012年11月05日 12:26:41 | serhiy.storchaka | set | messages: + msg174900 |
| 2012年11月05日 12:12:14 | vstinner | create | |