This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2011年05月29日 04:49 by py.user, last changed 2022年04月11日 14:57 by admin. This issue is now closed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| issue12204.diff | ezio.melotti, 2011年07月21日 04:21 | Patch to add a note in the doc. | review | |
| issue12204-2.diff | ezio.melotti, 2011年07月21日 04:54 | Patch that factors out definition of cased chars. | review | |
| Messages (14) | |||
|---|---|---|---|
| msg137167 - (view) | Author: py.user (py.user) * | Date: 2011年05月29日 04:49 | |
specification 1) str.upper()¶ Return a copy of the string converted to uppercase. 2) str.isupper()¶ Return true if all cased characters in the string are uppercase and there is at least one cased character, false otherwise. Cased characters are those with general category property being one of "Lu", "Ll", or "Lt" and uppercase characters are those with general category property "Lu". >>> '\u1ff3' 'ῳ' >>> '\u1ff3'.islower() True >>> '\u1ff3'.upper() 'ῼ' >>> '\u1ff3'.upper().isupper() False >>> |
|||
| msg137171 - (view) | Author: Ezio Melotti (ezio.melotti) * (Python committer) | Date: 2011年05月29日 08:05 | |
'\u1ff3'.upper() returns '\u1ffc', so we have: U+1FF3 (ῳ - GREEK SMALL LETTER OMEGA WITH YPOGEGRAMMENI) U+1FFC (ῼ - GREEK CAPITAL LETTER OMEGA WITH PROSGEGRAMMENI) The first belongs to the Ll (Letter, lowercase) category, whereas the second belongs to the Lt (Letter, titlecase) category. The entries for these two chars in the UnicodeData.txt[0] files are: 1FF3;GREEK SMALL LETTER OMEGA WITH YPOGEGRAMMENI;Ll;0;L;03C9 0345;;;;N;;;1FFC;;1FFC 1FFC;GREEK CAPITAL LETTER OMEGA WITH PROSGEGRAMMENI;Lt;0;L;03A9 0345;;;;N;;;;1FF3; U+1FF3 has U+1FFC in both the third last and last field (Simple_Uppercase_Mapping and Simple_Titlecase_Mapping respectively -- see [1]), so .upper() is doing the right thing here. U+1FFC has U+1FF3 in the second last field (Simple_Lowercase_Mapping), but since it's category is not Lu, but Lt, .isupper() returns False. The Unicode Standard Annex #44[2] defines the Lt category as: Lt Titlecase_Letter a digraphic character, with first part uppercase I'm not sure there's anything to fix here, both function behave as documented, and it might indeed be the case that .upper() returns chars with category Lt, that then return False with .isupper() [0]: http://unicode.org/Public/UNIDATA/UnicodeData.txt [1]: http://www.unicode.org/reports/tr44/#UnicodeData.txt [2]: http://www.unicode.org/reports/tr44/#GC_Values_Table |
|||
| msg137181 - (view) | Author: Marc-Andre Lemburg (lemburg) * (Python committer) | Date: 2011年05月29日 11:56 | |
Ezio Melotti wrote: > > Ezio Melotti <ezio.melotti@gmail.com> added the comment: > > '\u1ff3'.upper() returns '\u1ffc', so we have: > U+1FF3 (ῳ - GREEK SMALL LETTER OMEGA WITH YPOGEGRAMMENI) > U+1FFC (ῼ - GREEK CAPITAL LETTER OMEGA WITH PROSGEGRAMMENI) > The first belongs to the Ll (Letter, lowercase) category, whereas the second belongs to the Lt (Letter, titlecase) category. > > The entries for these two chars in the UnicodeData.txt[0] files are: > 1FF3;GREEK SMALL LETTER OMEGA WITH YPOGEGRAMMENI;Ll;0;L;03C9 0345;;;;N;;;1FFC;;1FFC > 1FFC;GREEK CAPITAL LETTER OMEGA WITH PROSGEGRAMMENI;Lt;0;L;03A9 0345;;;;N;;;;1FF3; > > U+1FF3 has U+1FFC in both the third last and last field (Simple_Uppercase_Mapping and Simple_Titlecase_Mapping respectively -- see [1]), so .upper() is doing the right thing here. > U+1FFC has U+1FF3 in the second last field (Simple_Lowercase_Mapping), but since it's category is not Lu, but Lt, .isupper() returns False. > > The Unicode Standard Annex #44[2] defines the Lt category as: > Lt Titlecase_Letter a digraphic character, with first part uppercase > > I'm not sure there's anything to fix here, both function behave as documented, and it might indeed be the case that .upper() returns chars with category Lt, that then return False with .isupper() > > [0]: http://unicode.org/Public/UNIDATA/UnicodeData.txt > [1]: http://www.unicode.org/reports/tr44/#UnicodeData.txt > [2]: http://www.unicode.org/reports/tr44/#GC_Values_Table I think there's a misunderstanding here: title cased characters are ones typically used in titles of a document. They don't necessarily have to be upper case, though, since some characters are never used as first letters of a word. Note that .upper() also does not guarantee to return an upper case character. It just applies the mapping defined in the Unicode standard and if there is no such mapping, or Python does not support the mapping, the method returns the original character. The German ß is such a character (U+00DF). It doesn't have an uppercase mapping in actual use and only received such a mapping in Unicode 5.1 based on rather controversial grounds (see http://en.wikipedia.org/wiki/ẞ). The character is normally mapped to 'SS' when converting it to upper case or title case. This multi-character mapping is not supported by Python, so .upper() just returns U+00DF. I suggest to close this ticket as invalid or to add a note to the documentation explaining how the mapping is applied (and when not). |
|||
| msg137554 - (view) | Author: Éric Araujo (eric.araujo) * (Python committer) | Date: 2011年06月03日 16:48 | |
A note sounds good. |
|||
| msg140778 - (view) | Author: Ezio Melotti (ezio.melotti) * (Python committer) | Date: 2011年07月21日 04:21 | |
Here's a patch. I don't think it's necessary to update the docstring. |
|||
| msg140779 - (view) | Author: Ezio Melotti (ezio.melotti) * (Python committer) | Date: 2011年07月21日 04:54 | |
New patch that factors out the definition of cased characters adding it to a footnote. |
|||
| msg140853 - (view) | Author: Éric Araujo (eric.araujo) * (Python committer) | Date: 2011年07月22日 01:13 | |
Patch looks good, with one issue: I’ve never encountered "cased character" before, is it an accepted term or an invention in our docs? |
|||
| msg140855 - (view) | Author: Ezio Melotti (ezio.melotti) * (Python committer) | Date: 2011年07月22日 01:51 | |
I think it's an invention, but its meaning is quite clear to me. |
|||
| msg142119 - (view) | Author: Roundup Robot (python-dev) (Python triager) | Date: 2011年08月15日 11:29 | |
New changeset 16edc5cf4a79 by Ezio Melotti in branch '3.2': #12204: document that str.upper().isupper() might be False and add a note about cased characters. http://hg.python.org/cpython/rev/16edc5cf4a79 New changeset fb49394f75ed by Ezio Melotti in branch '2.7': #12204: document that str.upper().isupper() might be False and add a note about cased characters. http://hg.python.org/cpython/rev/fb49394f75ed New changeset c821e3a54930 by Ezio Melotti in branch 'default': #12204: merge with 3.2. http://hg.python.org/cpython/rev/c821e3a54930 |
|||
| msg142120 - (view) | Author: Ezio Melotti (ezio.melotti) * (Python committer) | Date: 2011年08月15日 11:29 | |
Fixed, thanks for the report! |
|||
| msg142124 - (view) | Author: Raymond Hettinger (rhettinger) * (Python committer) | Date: 2011年08月15日 12:05 | |
Are you sure this should have been backported? Are there any apps that may be working now but won't be after the next point release? |
|||
| msg142126 - (view) | Author: Ezio Melotti (ezio.melotti) * (Python committer) | Date: 2011年08月15日 12:40 | |
This is only a doc patch, maybe you are confusing this issue with #12266? |
|||
| msg142127 - (view) | Author: Raymond Hettinger (rhettinger) * (Python committer) | Date: 2011年08月15日 12:47 | |
Right. I was looking at the other patches that went in in the last 24 hours. |
|||
| msg142128 - (view) | Author: Ezio Melotti (ezio.melotti) * (Python committer) | Date: 2011年08月15日 12:50 | |
It's unlikely that #12266 might break apps. The behavior changed only for fairly unusual characters, and the old behavior was clearly wrong. FWIW the str.capitalize() implementation of PyPy doesn't have the bug, and after the fix both CPython and PyPy have the same behavior. |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:57:17 | admin | set | github: 56413 |
| 2011年08月15日 12:50:10 | ezio.melotti | set | messages: + msg142128 |
| 2011年08月15日 12:47:18 | rhettinger | set | messages: + msg142127 |
| 2011年08月15日 12:40:44 | ezio.melotti | set | messages: + msg142126 |
| 2011年08月15日 12:05:20 | rhettinger | set | nosy:
+ rhettinger messages: + msg142124 |
| 2011年08月15日 11:29:57 | ezio.melotti | set | status: open -> closed resolution: fixed messages: + msg142120 stage: commit review -> resolved |
| 2011年08月15日 11:29:02 | python-dev | set | nosy:
+ python-dev messages: + msg142119 |
| 2011年07月22日 01:51:14 | ezio.melotti | set | messages: + msg140855 |
| 2011年07月22日 01:13:13 | eric.araujo | set | messages: + msg140853 |
| 2011年07月21日 04:54:55 | ezio.melotti | set | files:
+ issue12204-2.diff messages: + msg140779 |
| 2011年07月21日 04:21:26 | ezio.melotti | set | files:
+ issue12204.diff messages: + msg140778 assignee: docs@python -> ezio.melotti keywords: + patch stage: commit review |
| 2011年06月05日 13:37:41 | r.david.murray | link | issue12266 superseder |
| 2011年06月03日 16:48:03 | eric.araujo | set | versions:
+ Python 2.7, - Python 3.1 nosy: + eric.araujo, docs@python messages: + msg137554 assignee: docs@python components: + Documentation, - Interpreter Core, Unicode |
| 2011年05月29日 11:56:24 | lemburg | set | nosy:
+ lemburg messages: + msg137181 |
| 2011年05月29日 08:05:14 | ezio.melotti | set | versions:
+ Python 3.2, Python 3.3 nosy: + ezio.melotti, belopolsky messages: + msg137171 components: + Interpreter Core, Unicode, - None |
| 2011年05月29日 04:49:20 | py.user | create | |