Python UTF-8 conversion

Asked 12 years, 10 months ago

Viewed 246 times

I would like to ask how do the following conversion (source->target) by Python program.

>>> source = '\\x{4e8b}\\x{696d}'
>>> print source
\x{4e8b}\x{696d}
>>> print type(source)
<type 'str'>
>>> target = u'\u4e8b\u696d'
>>> print target.encode('utf-8')
事業

Thank you.

Improve this question

asked Mar 9, 2013 at 4:37

jack's user avatar

jack

18.1k40 gold badges104 silver badges128 bronze badges

Add a comment |

3 Answers 3

Sorted by: Reset to default

Taking advantage of Blender's idea, you could use re.sub with a callable replacement argument:

import re
def touni(match):
 return unichr(int(match.group(1), 16))
source = '\\x{4e8b}\\x{696d}'
print(re.sub(r'\\x\{([\da-f]+)\}', touni, source))

yields

事業

Improve this answer

edited Mar 9, 2013 at 11:23

answered Mar 9, 2013 at 4:49

unutbu's user avatar

unutbu

887k197 gold badges1.9k silver badges1.7k bronze badges

2 Comments

root

root Over a year ago

I suppose you really wouldn't want restrict the matches only to codepoints that have lenght 4 if you are writing a conversion program. Otherwise +1.

2013年03月09日T05:09:32.927Z+00:00

unutbu

unutbu Over a year ago

@root: Yes, I suppose that was overly restrictive.

2013年03月09日T11:24:38.283Z+00:00

You can use int and unichr to convert them:

>>> int('4e8b', 16)
 20107
>>> unichr(int('4e8b', 16))
 u'\u4e8b'
>>> print unichr(int('4e8b', 16))
事

Improve this answer

answered Mar 9, 2013 at 4:45

Blender's user avatar

Blender

300k55 gold badges463 silver badges513 bronze badges

Comments

import re
p = re.compile(r'[\W\\x]+')
print ''.join([unichr(int(y, 16)) for y in p.split(source) if y != ''])
事業

also stole idea from @Blender...

Improve this answer

answered Mar 9, 2013 at 4:52

zzk's user avatar

zzk

1,3679 silver badges15 bronze badges

Comments

Your Answer

Draft saved

Draft discarded

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

lang-py

CollectivesTM on Stack Overflow

Python UTF-8 conversion

3 Answers 3

2 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

CollectivesTM on Stack Overflow

3 Answers 3

2 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related