Unicode utf-8/utf-16 encoding in Python

Asked 16 years, 5 months ago

Viewed 76k times

In python:

u'\u3053\n'

Is it utf-16?

I'm not really aware of all the unicode/encoding stuff, but this type of thing is coming up in my dataset, like if I have a=u'\u3053\n'.

print gives an exception and decoding gives an exception.

a.encode("utf-16") > '\xff\xfeS0\n\x00'
a.encode("utf-8") > '\xe3\x81\x93\n'
print a.encode("utf-8") > πüô
print a.encode("utf-16") > ■しかくS0

What's going on here?

Improve this question

edited Jun 22, 2010 at 17:00

SilentGhost's user avatar

SilentGhost

322k67 gold badges312 silver badges294 bronze badges

asked Aug 4, 2009 at 19:22

8steve8's user avatar

8steve8

2,3333 gold badges16 silver badges17 bronze badges

1

fileformat.info/info/unicode/char/3053/index.htm

8steve8
– 8steve8

2009年08月04日 19:32:02 +00:00
Commented Aug 4, 2009 at 19:32

Add a comment |

4 Answers 4

Sorted by: Reset to default

It's a unicode character that doesn't seem to be displayable in your terminals encoding. print tries to encode the unicode object in the encoding of your terminal and if this can't be done you get an exception.

On a terminal that can display utf-8 you get:

>>> print u'\u3053'
こ

Your terminal doesn't seem to be able to display utf-8, else at least the print a.encode("utf-8") line should produce the correct character.

Improve this answer

answered Aug 4, 2009 at 19:35

sth's user avatar

sth

231k56 gold badges288 silver badges370 bronze badges

1 Comment

8steve8

8steve8 Over a year ago

thanks yes, powershell , even powershell ISE doesn't seem "compatable" (for lack of a better understanding) with unicode in python. stackoverflow.com/questions/2105022/…

2010年02月05日T17:21:03.453Z+00:00

You ask:

u'\u3053\n'

Is it utf-16?

The answer is no: it's unicode, not any specific encoding. utf-16 is an encoding.

To print a Unicode string effectively to your terminal, you need to find out what encoding that terminal is willing to accept and able to display. For example, the Terminal.app on my laptop is set to UTF-8 and with a rich font, so:

screenshot
_{(source: aleax.it)}

...the Hiragana letter displays correctly. On a Linux workstation I have a terminal program that keeps resetting to Latin-1 so it would mangle things somewhat like yours -- I can set it to utf-8, but it doesn't have huge number of glyphs in the font, so it would display somewhat-useless placeholder glyphs instead.

Improve this answer

edited Apr 4, 2019 at 10:59

Glorfindel's user avatar

Glorfindel

22.8k13 gold badges97 silver badges124 bronze badges

answered Aug 5, 2009 at 2:15

Alex Martelli's user avatar

Alex Martelli

888k175 gold badges1.3k silver badges1.4k bronze badges

1 Comment

Cyriac Antony

Cyriac Antony Over a year ago

Is it possible to print utf-16 characters in python?

2019年09月26日T09:24:36.973Z+00:00

Character U+3053 "HIRAGANA LETTER KO".

The \xff\xfe bit at the start of the UTF-16 binary format is the encoded byte order mark (U+FEFF), then "S0" is \x5e\x30, then there's the \n from the original string. (Each of the characters has its bytes "reversed" as it's using little endian UTF-16 encoding.)

The UTF-8 form represents the same Hiragana character in three bytes, with the bit pattern as documented here.

Now, as for whether you should really have it in your data set... where is this data coming from? Is it reasonable for it to have Hiragana characters in it?

Improve this answer

answered Aug 4, 2009 at 19:37

Jon Skeet's user avatar

Jon Skeet

1.5m895 gold badges9.3k silver badges9.3k bronze badges

Comments

Here's the Unicode HowTo Doc for Python 2.6.2:

http://docs.python.org/howto/unicode.html

Also see the links in the Reference section of that document for other explanations, including one by Joel Spolsky.

Improve this answer

answered Aug 4, 2009 at 19:33

Anon's user avatar

Anon

12.8k3 gold badges26 silver badges19 bronze badges

Comments

Your Answer

Draft saved

Draft discarded

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

lang-py

CollectivesTM on Stack Overflow

Unicode utf-8/utf-16 encoding in Python

4 Answers 4

1 Comment

1 Comment

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

CollectivesTM on Stack Overflow

4 Answers 4

1 Comment

1 Comment

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related