12

I would like to configure my console on Windows XP to support UTF8 and to have python detect that and work with it.

So far, my attempts:

C:\Documents and Settings\Philippe>C:\Python25\python.exe
Python 2.5.2 (r252:60911, Feb 21 2008, 13:11:45) [MSC v.1310 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> print u'é'
é
>>> import sys
>>> sys.stdout.encoding
'cp437'
>>> quit()

So, by default I am in cp437 and python detects that just fine.

C:\Documents and Settings\Philippe>chcp 65001
Active code page: 65001
C:\Documents and Settings\Philippe>python
Python 2.5.2 (r252:60911, Feb 21 2008, 13:11:45) [MSC v.1310 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.stdout.encoding
'cp65001'
>>> print u'é'
C:\Documents and Settings\Philippe>

It seems like printing in UTF8 makes python crash now...

wovano
5,1885 gold badges33 silver badges58 bronze badges
asked Aug 10, 2011 at 16:34
2
  • What does make you think you print utf8 here in the first place? Commented Sep 20, 2011 at 19:09
  • See also: stackoverflow.com/a/30505612/788700 Commented May 28, 2015 at 11:44

4 Answers 4

8

I would like to configure my console on Windows XP to support UTF8

I don't think it's going to happen.

The 65001 code page is buggy; some stdio calls behave incorrectly and break many tools. Whilst you can register cp65001 as an encoding manually:

def cp65001(name):
 if name.lower()=='cp65001':
 return codecs.lookup('utf-8')
codecs.register(cp65001)

and this allows you to print u'some unicode string', it doesn't allow you to write non-ASCII characters in that Unicode string. You get the same odd errors (IOError 0 et al) that you do when you try to write non-ASCII UTF-8 sequences directly as byte strings.

Unfortunately UTF-8 is a second-class citizen under Windows. NT's Unicode model was drawn up before UTF-8 existed and consequently you're expected to use two-byte-per-code-unit encodings (UTF-16, originally UCS-2) anywhere you want consistent Unicode. Using byte strings, like many portable apps and languages (such as Python) written with C's stdio, doesn't fit that model.

And rewriting Python to use the Windows Unicode console calls (like WriteConsoleW) instead of the portable C stdio ones doesn't play well with shell tricks like piping and redirecting to a file. (Not to mention that you still have to change from the default terminal font to a TTF one before you can see the results working at all...)

Ultimately if you need a command line with working UTF-8 support for stdio-based apps, you'd probably be better off using an alternative to the Windows Console that deliberately supports it, such as Cygwin's, or Python's IDLE or pywin32's PythonWin.

answered Aug 10, 2011 at 21:36
Sign up to request clarification or add additional context in comments.

Comments

4

When I try the same thing on Python 2.7 I get an error on import sys:

LookupError: unknown encoding: cp65001

This implies to me that Python doesn't know how to work with the special Windows UTF-8 code page, and 2.5 handled the situation ungracefully.

Apparently this was investigated and not fixed in Python 3.2: http://bugs.python.org/issue6058

Update: In What's New In Python 3.3 it lists cp65001 support as a new feature.

answered Aug 10, 2011 at 17:22

2 Comments

Nope, Python 3.2 crashes for me when chcp 65001 is active as well. That particular issue was closed as invalid, not fixed.
@Mark Tolonen, thanks for the update. Obviously my reading comprehension skills need improvement.
1

set this in your win:

set PYTHONIOENCODING=utf-8
answered May 18, 2015 at 3:00

Comments

0

I had problems displaying the Euro symbol in the cmd console from a Python script using Windows Vista. Here's what worked for me:

Fist, I need to make sure the font is set as Lucinda Console and not Raster Fonts which don't work. That can be done by setting the default properties of the console in the drop down menu of the console window and restarting the console window with cmd.exe.

Second, when I run cmd I set the code page with chcp 1252.

Third, I make sure my editor (Notepad++) has the right encoding settings. On the Encoding drop down menu in Notepad++ select Encode in UTF-8.

That worked for me.

answered May 31, 2014 at 14:32

Comments

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.