Message158460
| Author |
serhiy.storchaka |
| Recipients |
asvetlov, ezio.melotti, loewis, pitrou, roger.serwy, serhiy.storchaka, vstinner |
| Date |
2012年04月16日.14:56:47 |
| SpamBayes Score |
-1.0 |
| Marked as misclassified |
Yes |
| Message-id |
<1334588207.85.0.0319557013719.issue14304@psf.upfronthosting.co.za> |
| In-reply-to |
| Content |
Example:
>>> '\u0100'
'Ā'
>>> '\u0100\U00010000'
'\u0100\U00010000'
>>> print('\u0100')
Ā
>>> print('\u0100\U00010000')
Traceback (most recent call last):
File "<pyshell#33>", line 1, in <module>
print('\u0100\U00010000')
UnicodeEncodeError: 'UCS-2' codec can't encode characters in position 1-1: Non-BMP character not supported in Tk
But I think that it is too specific problem and too specific solution. It would be better if IDLE itself escapes the string in the most appropriate way.
def utf8bmp_encode(s):
return ''.join(c if ord(c) <= 0xffff else '\\U%08x' % ord(c) for c in s).encode('utf-8')
or
def utf8bmp_encode(s):
return re.sub('[^\x00-\uffff]', lambda m: '\\U%08x' % ord(m.group()), s).encode('utf-8') |
|