Message159130
| Author |
serhiy.storchaka |
| Recipients |
belopolsky, ezio.melotti, georg.brandl, lemburg, moese, phr, serhiy.storchaka, tchrist, vstinner |
| Date |
2012年04月24日.11:01:36 |
| SpamBayes Score |
-1.0 |
| Marked as misclassified |
Yes |
| Message-id |
<1335265297.03.0.881571661115.issue2857@psf.upfronthosting.co.za> |
| In-reply-to |
| Content |
As far as I understand, this codec can be implemented in Python. There is no need to modify the interpreter core.
def decode_cesu8(b):
return re.sub('[\uD800-\uDBFF][\uDC00\DFFF]', lambda m: chr(0x10000 | ((ord(m.group()[0]) & 0x3FF) << 10) | (ord(m.group()[1]) & 0x3FF)), b.decode('utf-8', 'surrogatepass'))
def encode_cesu8(s):
return re.sub('[\U00010000-\U0010FFFF]', lambda m: chr(0xD800 | ((ord(m.group()) >> 10) & 0x3FF)) + chr(0xDC00 | (ord(m.group() & 0x3FF)), s).encode('utf-8', 'surrogatepass')
def decode_mutf8(b):
return decode_cesu8(b.replace(b'\xC0\x80', b'\x00'))
def encode_mutf8(s):
return encode_cesu8(s).replace(b'\x00', b'\xC0\x80') |
|