I would like to ask how do the following conversion (source->target) by Python program.
>>> source = '\\x{4e8b}\\x{696d}'
>>> print source
\x{4e8b}\x{696d}
>>> print type(source)
<type 'str'>
>>> target = u'\u4e8b\u696d'
>>> print target.encode('utf-8')
事業
Thank you.
asked Mar 9, 2013 at 4:37
jack
18.1k40 gold badges104 silver badges128 bronze badges
3 Answers 3
Taking advantage of Blender's idea, you could use re.sub with a callable replacement argument:
import re
def touni(match):
return unichr(int(match.group(1), 16))
source = '\\x{4e8b}\\x{696d}'
print(re.sub(r'\\x\{([\da-f]+)\}', touni, source))
yields
事業
answered Mar 9, 2013 at 4:49
unutbu
887k197 gold badges1.9k silver badges1.7k bronze badges
Sign up to request clarification or add additional context in comments.
You can use int and unichr to convert them:
>>> int('4e8b', 16)
20107
>>> unichr(int('4e8b', 16))
u'\u4e8b'
>>> print unichr(int('4e8b', 16))
事
answered Mar 9, 2013 at 4:45
Blender
300k55 gold badges463 silver badges513 bronze badges
Comments
import re
p = re.compile(r'[\W\\x]+')
print ''.join([unichr(int(y, 16)) for y in p.split(source) if y != ''])
事業
also stole idea from @Blender...
Comments
lang-py