Message144679
| Author |
ezio.melotti |
| Recipients |
ezio.melotti, gvanrossum, lemburg, loewis, mrabarnett, tchrist, terry.reedy |
| Date |
2011年09月30日.08:59:08 |
| SpamBayes Score |
1.8249656e-06 |
| Marked as misclassified |
No |
| Message-id |
<1317373151.27.0.18276033418.issue12753@psf.upfronthosting.co.za> |
| In-reply-to |
| Content |
The attached patch changes Tools/unicode/makeunicodedata.py to create a list of names and codepoints taken from http://www.unicode.org/Public/6.0.0/ucd/NameAliases.txt and adds it to Modules/unicodename_db.h.
During the lookup the _getcode function at Modules/unicodedata.c:1055 loops over the 11 aliases and checks if any of those match.
The patch also includes tests for both unicodedata.lookup and \N{}.
I'm not sure this is the best way to implement this, and someone will probably want to review and tweak both the approach and the C code, but it works fine:
>>> "\N{LATIN CAPITAL LETTER GHA}"
'Ƣ'
>>> import unicodedata
>>> unicodedata.lookup("LATIN CAPITAL LETTER GHA")
'Ƣ'
>>> "\N{LATIN CAPITAL LETTER OI}"
'Ƣ'
>>> unicodedata.lookup("LATIN CAPITAL LETTER OI")
'Ƣ'
The patch doesn't include changes for NamedSequences.txt. |
|