[Python-ideas] Support Unicode code point notation

Sun Jul 28 20:07:21 CEST 2013

On 28/07/2013 18:29, Steven D'Aprano wrote:
> On 28/07/13 23:06, Nick Coghlan wrote:
>>> It would also be more consistent if unicodedata.lookup() was updated
>> to handle numeric code point names. Something like:
>>>>>>> import unicodedata
>>>>> def enhanced_lookup(name):
>> ... if name.startswith("U+"):
>> ... return chr(int(name[2:], 16))
>> ... return unicodedata.lookup(name)
>> ...
>>>>> enhanced_lookup("GREEK SMALL LETTER ALPHA")
>> 'α'
>>>>> enhanced_lookup("U+03B1")
>> 'α'
>>> Earlier, MRAB suggested that unicodedata.name() could return the U+ code point in the case of unnamed characters.

What I said was:
"""I think the point of "\N{U+03C0}" is that it lets you name all of the
codepoints, even those that are as yet unnamed."""
Whether unicodedata.name() could have a fallback is something I've
never considered. Until now... :-)
 > I think it would be better to have a separate unicodedata function to 
return the code point, and leave the current behaviour of name() alone.
>> def codepoint(c):
> return 'U+{:04X}'.format(ord(c))
>> This should always succeed for any character.
>