8
Python 3.2 (r32:88445, Feb 20 2011, 21:29:02) [MSC v.1500 32 bit (Intel)] on win32
Type "copyright", "credits" or "license()" for more information.
>>> str_version = 'នយោបាយ'
>>> type(str_version)
<class 'str'>
>>> print (str_version)
នយោបាយ
>>> unicode_version = 'នយោបាយ'.decode('utf-8')
Traceback (most recent call last):
 File "<pyshell#3>", line 1, in <module>
 unicode_version = 'នយោបាយ'.decode('utf-8')
AttributeError: 'str' object has no attribute 'decode'
>>> 

What the problem with my unicode string?

Daniel Imms
50.6k19 gold badges157 silver badges170 bronze badges
asked Mar 26, 2011 at 20:57

3 Answers 3

10

There is nothing wrong with your string! You just have confused encode() and decode(). The string is meaningful symbols. To turn it into bytes that could be stored in a file or transmitted over the Internet, use encode() with an encoding like UTF-8. Each encoding is a scheme for converting meaningful symbols to flat bytes of output.

When the time comes to do the opposite — to take some raw bytes from a file or a socket and turn them into symbols like letters and numbers — you will decode the bytes using the decode() method of bytestrings in Python 3.

>>> str_version = 'នយោបាយ'
>>> str_version.encode('utf-8')
b'\xe1\x9e\x93\xe1\x9e\x99\xe1\x9f\x84\xe1\x9e\x94\xe1\x9e\xb6\xe1\x9e\x99'

See that big long line of bytes? Those are the bytes that UTF-8 uses to represent your string, if you need to transmit the string over a network, or store them in a document. There are many other encodings in use, but it seems to be the most popular. Each encoding can turn meaningful symbols like ន and យោ into bytes — the little 8-bit numbers with which computers communicate.

>>> rawbytes = str_version.encode('utf-8')
>>> rawbytes
b'\xe1\x9e\x93\xe1\x9e\x99\xe1\x9f\x84\xe1\x9e\x94\xe1\x9e\xb6\xe1\x9e\x99'
>>> rawbytes.decode('utf-8')
'នយោបាយ'
answered Mar 26, 2011 at 21:03
Sign up to request clarification or add additional context in comments.

3 Comments

still not clean .Could you more clear explain ? thanks Brandon Craig Rhodes
I have added another paragraph, and some code samples — do those make it any clearer?
Now it's clear .I understand right now from your example ,thank you so much @Brandon Craig Rhodes
7

You're reading the 2.x docs. str.decode() (and bytes.encode()) was dropped in 3.x. And str is already a Unicode string; there's no need to decode it.

answered Mar 26, 2011 at 21:05

Comments

3

You already have a unicode string. In Python 3, str are unicode strings (unicode in Python 2.x), and single-byte strings (Python 2.x str) aren't treated as text anymore, they're now called bytes. The latter can be converted into a str with its decode method, but the former is already decoded - you can only encode it back into bytes.

answered Mar 26, 2011 at 21:11

Comments

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.