This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2010年11月30日 05:46 by belopolsky, last changed 2022年04月11日 14:57 by admin. This issue is now closed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| issue10587.diff | belopolsky, 2010年12月14日 15:42 | |||
| Messages (7) | |||
|---|---|---|---|
| msg122885 - (view) | Author: Alexander Belopolsky (belopolsky) * (Python committer) | Date: 2010年11月30日 05:46 | |
On Mon, Nov 29, 2010 at 4:13 PM, "Martin v. Löwis" <martin@v.loewis.de> wrote: >> - How specific should library reference manual be in defining methods >> affected by UCD such as str.upper()? > > It should specify what this actually does in Unicode terminology > (probably in addition to a layman's rephrase of that) > http://mail.python.org/pipermail/python-dev/2010-November/106155.html Some of the clarifications may actually lead to a conclusion that current behavior is wrong. For example, Unicode defines Alphabetic property as Lu + Ll + Lt + Lm + Lo + Nl + Other_Alphabetic http://www.unicode.org/reports/tr44/tr44-6.html#Alphabetic However, str.isalpha() is defined as just Lu + Ll + Lt + Lm + Lo. For example, >>> import unicodedata as ud >>> ud.category('V') 'Nl' >>> 'V'.isalpha() False >>> ud.name('V') 'ROMAN NUMERAL FIVE' As far a I can tell, the source of Other_Alphabetic property data, http://unicode.org/Public/UNIDATA/PropList.txt, is not even included in the unicodedata module and neither is SpecialCasing.txt which is necessary for implementing a compliant case mapping algorithm. |
|||
| msg122927 - (view) | Author: Martin v. Löwis (loewis) * (Python committer) | Date: 2010年11月30日 18:53 | |
What is the issue that you are reporting? that the status quo should be documented, or that isalpha is wrong? These are independent - don't mix them. |
|||
| msg122931 - (view) | Author: Alexander Belopolsky (belopolsky) * (Python committer) | Date: 2010年11月30日 19:10 | |
On Tue, Nov 30, 2010 at 1:53 PM, Martin v. Löwis <report@bugs.python.org> wrote: .. > What is the issue that you are reporting? that the status quo should be documented, or that isalpha is wrong? > These are independent - don't mix them. This is a documentation issue. I don't say that str.isalpha() is necessarily wrong. (If unicodedata had an isAlphabetic() menthod defined as Lu + Ll + Lt + Lm + Lo, I would file a bug report for that.) Here, I just want to mention that proper str.isalpha() definition is subject to debate and it being defined as Lu + Ll + Lt + Lm + Lo may need to be marked as CPython implementation detail. Note that the Unicode book (sorry, don't have the page reference) advises not to rely on catch-all APIs such as isAlphabetic(), but consult the underlying properties directly. I tend to agree with that because some programs may want to treat say Roman numerals as letters and some as numbers, so whether isAlphabetic() should include Nl category is better left to the application. |
|||
| msg123270 - (view) | Author: Alexander Belopolsky (belopolsky) * (Python committer) | Date: 2010年12月03日 17:08 | |
As discussed in issue10610, it is important to keep the gory details in one place and refer to it throughout the manual. I think the Unicode terminology is best exposed in the unicodedata module documentation. For string character-type methods, I suggest presenting an equivalent to unicodedata expression where possible. For example, x.isalpha() is equivalent to all(unicodedata.category(c) in 'Lu Ll Lt Lm Lo' for c in x) or many be just a "character c is alphabetical if unicodedata.category(c) in 'Lu Ll Lt Lm Lo' is true. Other examples: isdigit() -> unicodedata.digit(c, None) is not None isdecimal() -> unicodedata.decimal(c, None) is not None isnumeric() -> unicodedata.numeric(c, None) is not None isprintable()-> unicodedata.category(c) not in 'Cc Cf Cs Co Cn Zl Zp Zs' islower() -> unicodedata.category(c) == 'Ll' isupper() -> unicodedata.category(c) == 'Lu' istitle() -> unicodedata.category(c) == 'Lt' isalnum() -> isalpha() or isdecimal() or isdigit() or isnumeric() I am not sure about equivalent to expressions for isidentifier() and isspace(). |
|||
| msg123955 - (view) | Author: Alexander Belopolsky (belopolsky) * (Python committer) | Date: 2010年12月14日 15:42 | |
I am attaching a patch that expands the documentation of isalnum, isalpha, isdecimal, isdigit, isnumeric, islower, isupper, and isspace. I did not change isidentifier or isprintable because their docs were already complete. I also left out istitle because I could not figure out how to deal with the confusion between Python and Unicode notions of titlecase. I would also like to note that it appears that isdigit and isdecimal imply isnumeric, so s.isalnum() is equivalent to all(c.isalpha() or c.isnumeric() for c in s). However the actual code does have redundant checks for isdecimal() and isdigit(). I think the documentation should reflect what the code does for an off-chance that someone would replace unicodedata with their own database with which these checks are not redundant. |
|||
| msg124522 - (view) | Author: Senthil Kumaran (orsenthil) * (Python committer) | Date: 2010年12月22日 22:58 | |
... > redundant checks for isdecimal() and isdigit(). I think the > documentation should reflect what the code does for an off-chance > that someone would replace unicodedata with their own database with > which these checks are not redundant. +1 for making these changes. Helps clarify meaning of these methods with respect to Unicode strings. |
|||
| msg124532 - (view) | Author: Alexander Belopolsky (belopolsky) * (Python committer) | Date: 2010年12月23日 03:02 | |
Committed r87443 (3.2) and r87444 (3.1). |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:57:09 | admin | set | github: 54796 |
| 2010年12月23日 03:02:10 | belopolsky | set | status: open -> closed nosy: lemburg, loewis, belopolsky, orsenthil, vstinner, ezio.melotti, docs@python messages: + msg124532 resolution: fixed stage: commit review -> resolved |
| 2010年12月22日 22:58:21 | orsenthil | set | nosy:
+ orsenthil messages: + msg124522 |
| 2010年12月14日 15:42:15 | belopolsky | set | files:
+ issue10587.diff messages: + msg123955 assignee: docs@python -> belopolsky keywords: + patch stage: commit review |
| 2010年12月03日 17:08:21 | belopolsky | set | messages: + msg123270 |
| 2010年11月30日 19:10:49 | belopolsky | set | messages: + msg122931 |
| 2010年11月30日 18:53:55 | loewis | set | nosy:
+ loewis messages: + msg122927 |
| 2010年11月30日 17:09:05 | belopolsky | link | issue1170 dependencies |
| 2010年11月30日 05:48:11 | belopolsky | set | nosy:
+ lemburg, vstinner, ezio.melotti |
| 2010年11月30日 05:46:44 | belopolsky | create | |