homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author ezio.melotti
Recipients ezio.melotti, gvanrossum, lemburg, loewis, mrabarnett, tchrist, terry.reedy
Date 2011年10月09日.13:20:55
SpamBayes Score 1.6883439e-11
Marked as misclassified No
Message-id <1318166459.35.0.821972995683.issue12753@psf.upfronthosting.co.za>
In-reply-to
Content
Here is a new patch that stores the names of aliases and named sequences in the Private Use Area.
To summarize a bit, this is what we want:
 | 6.0.0 | 3.2.0 |
--------+-------+-------+
\N{...} | A | - |
.name | - | - |
.lookup | A,NS | - |
I.e., \N{...} should only support aliases, unicodedata.lookup should support aliases and named sequences, unicodedata.name doesn't support either, and when 3.2.0 is used nothing is supported.
The function calls involved for these 3 functions are:
\N{...} and .lookup:
 _getcode
 _cmpname
 _getucname
 _check_alias
.name:
 _getucname
My patch adds an extra arg to _getcode and _getucname (I hope that's fine -- or are they public?).
_getcode is called by \N{...} and .lookup; both support aliases, so _getcode now resolves aliases by default. Since only .lookup wants named sequences, _getcode now accepts an extra 'with_named_seq' arg and looks up named sequences only when its value is 1. .lookup passes 1, gets the codepoint, and converts it to a sequence. \N{...} passes 0 and doesn't get named sequences.
_getucname is called by .name and indirectly (through _cmpname) by .lookup and \N{...}. Since _getcode takes care of deciding who gets aliases and sequences, _getucname now accepts an extra 'with_alias_and_seq' arg and looks up aliases and named sequences only when its value is 1. _cmpname passes 1, gets aliases and named sequences and then lets _getcode decide what to do with them. .name passes 0 and doesn't get aliases and named sequences.
All this happens on 6.0.0 only, when self != NULL (i.e. we are using 3.2.0) named sequences and aliases are ignored.
The patch doesn't include the changes to unicodename_db.h -- run makeunicodedata.py to get them.
I also added more tests to make sure that the names added in the PUA don't leak, and that ucd_3_2_0 is not affected.
History
Date User Action Args
2011年10月09日 13:21:00ezio.melottisetrecipients: + ezio.melotti, lemburg, gvanrossum, loewis, terry.reedy, mrabarnett, tchrist
2011年10月09日 13:20:59ezio.melottisetmessageid: <1318166459.35.0.821972995683.issue12753@psf.upfronthosting.co.za>
2011年10月09日 13:20:58ezio.melottilinkissue12753 messages
2011年10月09日 13:20:58ezio.melotticreate

AltStyle によって変換されたページ (->オリジナル) /