1

I'm having the following problems in reading and printing Turkish in python, the Turkish letters in the word cannot be recognized. But such problem doesn't arise when I try to store strings on other languages such as Russian, Japanese and Chinese.

>>> s = u'abartmadığını'
>>> s
u'abartmad???n?'
>>> print s
abartmad???n?

How can I adjust the encoding to solve this problem? I am using Python 2.7.10 on Windows 10 and changing the code page of command line to 28595 doesn't seem to work, I just got the following error in python console.

LookupError: unknown encoding: cp28595

Alastair McCormack
28k8 gold badges81 silver badges106 bronze badges
asked Dec 5, 2015 at 4:05
4
  • Maybe you might need to accept using non-Turkish letters, because Turkish letters might not be usable in unicode. Commented Dec 5, 2015 at 4:09
  • @FranzNoel nope, the same thing works well on Mac OS, there must be some issues with the environment Commented Dec 5, 2015 at 4:18
  • Works well on Linux. Must be something with Windows 10. Are you using the CMD terminal? Commented Dec 5, 2015 at 4:19
  • Are you typing that directly at the console? That's likely not going to work without a Turkish version of Windows, or configuring the Windows system locale to Turkey. Commented Dec 5, 2015 at 8:17

2 Answers 2

2

The Windows console is notorious for not supporting Unicode well. Use an IDE that supports UTF-8 output. Here's an example from PythonWin, part of the pywin32 third-party module:

PythonWin 2.7.9 (default, Dec 10 2014, 12:24:55) [MSC v.1500 32 bit (Intel)] on win32.
Portions Copyright 1994-2008 Mark Hammond - see 'Help/About PythonWin' for further copyright information.
>>> s = u'abartmadığını. 我是美国人。 ру́сский язы́к'
>>> s
u'abartmad\u0131\u011f\u0131n\u0131. \u6211\u662f\u7f8e\u56fd\u4eba\u3002 \u0440\u0443\u0301\u0441\u0441\u043a\u0438\u0439 \u044f\u0437\u044b\u0301\u043a'
>>> print s
abartmadığını. 我是美国人。 ру́сский язы́к
answered Dec 5, 2015 at 8:25
Sign up to request clarification or add additional context in comments.

2 Comments

Mark is right, even today unicode and windows console do have serious bugs.
This module specifically targets the Windows Console: github.com/Drekin/win-unicode-console
2

Encode it to utf-8

>>> s = u'abartmadığını'
>>> print s.encode('utf-8')
abartmadığını
Alastair McCormack
28k8 gold badges81 silver badges106 bronze badges
answered Dec 5, 2015 at 4:19

2 Comments

That only works if the console encoding is configured for UTF-8, which isn't likely the case on Windows.
You should not encode when printing! stdout already has an encoding applied so you'll potentially double encode and will make your code non-platform agnostic. If users have errors when printing, then they should investigate the underlying issue. In this case, the user is using Windows, so encoding for the console is not the solution.

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.