Python encoding/decoding error

Question 1

I'm using Python 2.7.3. My operating system is Windows7(32-bit). In the cmd, I typed this code:

chcp 1254

and I converted decoding system to 1254. But,

#!/usr/bin/env python
# -*- coding:cp1254 -*-
print "öçışğüÖÇİŞĞÜ"

When I ran above codes, I got that output:

÷2■しかく3ÍÃ¦Ìo▄

But when I put u after the print command (print u"öçışğüÖÇİŞĞÜ")

When I edited codes as that:

#!/usr/bin/env python
# -*- coding:cp1254 -*-
import os
a = r"C:\\"
b = "ö"
print os.path.join(a, b)

I got that output:

÷

That's why when I tried

print unicode(os.path.join(a, b))

command. I got that error:

print unicode(os.path.join(a, b))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xf6 in position 13: ordinal
 not in range(128)

By trying a different way:

print os.path.join(a, b).decode("utf-8").encode(sys.stdout.encoding)

When I tried above code, I got that error:

print os.path.join(a, b).decode("utf-8").encode(sys.stdout.encoding)
 File "C:\Python27\lib\encodings\utf_8.py", line 16, in decode
 return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xf6 in position 13: invalid start byte

As a result, I can't get rid of this error. How can I solve it ? Thanks.

Question 2

I'm not reproducing the error: the initial code works fine here.

Question 3

What is the output of this command ?: chcp

Question 4

"C:\\ö" I tried it on 2 Python 2.7 installs, one a Windows 7 32 bit install.

Question 5

Are you running this from the standard "Command Prompt"?

Question 6

For me, it produces the incorrect encoding C:\\├╢ with the standard command prompt, from Cygwin, it works perfectly fine, suggesting the issue is the representation of the encoding in stdout.

Question 7

When I run your original code, but use chcp 857 (the Turkish OEM code page) I can reproduce your issue, so I do not think you were running chcp 1254:

÷2■しかく3ÍÃ¦Ìo▄

If you declare your source encoding as:

# -*- coding:cp1254 -*-

You must save your source code in that encoding. If you don't use Unicode strings, you must also use the same encoding at the console. Then it works correctly.

Example (source declared cp1254, but saved incorrectly as cp1252, and console chcp 1254):

öçisgüÖÇISGÜ

Example (source declared and saved correctly as cp1254, console chcp 1254):

öçışğüÖÇİŞĞÜ

It is important to note that with Unicode strings, you don't have to match the source encoding with the encoding of your console.

Example (declared and saved as UTF-8, with Unicode string):

#!python2
# -*- coding:utf8 -*-
print u"öçışğüÖÇİŞĞÜ"

Output (use any code page that supports Turkish...1254, 857, 1026...):

öçışğüÖÇİŞĞÜ

Mark Tolonen 181k26 gold badges184 silver badges279 bronze badges · Accepted Answer · 2015-06-11 21:37:41Z

When I run your original code, but use chcp 857 (the Turkish OEM code page) I can reproduce your issue, so I do not think you were running chcp 1254:

÷2■しかく3ÍÃ¦Ìo▄

If you declare your source encoding as:

# -*- coding:cp1254 -*-

You must save your source code in that encoding. If you don't use Unicode strings, you must also use the same encoding at the console. Then it works correctly.

Example (source declared cp1254, but saved incorrectly as cp1252, and console chcp 1254):

öçisgüÖÇISGÜ

Example (source declared and saved correctly as cp1254, console chcp 1254):

öçışğüÖÇİŞĞÜ

It is important to note that with Unicode strings, you don't have to match the source encoding with the encoding of your console.

Example (declared and saved as UTF-8, with Unicode string):

#!python2
# -*- coding:utf8 -*-
print u"öçışğüÖÇİŞĞÜ"

Output (use any code page that supports Turkish...1254, 857, 1026...):

öçışğüÖÇİŞĞÜ

CollectivesTM on Stack Overflow

Python encoding/decoding error

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

CollectivesTM on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related