homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Python 3 doesn't support cp65001 as the OEM code page
Type: crash Stage:
Components: Unicode Versions: Python 3.1
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: amaury.forgeotdarc, bferris57, ezio.melotti, r.david.murray, vstinner
Priority: normal Keywords:

Created on 2011年07月25日 02:24 by bferris57, last changed 2022年04月11日 14:57 by admin. This issue is now closed.

Messages (13)
msg141067 - (view) Author: Bruce Ferris (bferris57) Date: 2011年07月25日 02:24
The following scenario GPFs on Windows Vista using cmd.exe...
 D:\>python
 Python 3.1.2 (r312:79149, Mar 21 2010, 00:41:52) [MSC v.1500 32 bit
 (Intel)] on win32
 Type "help", "copyright", "credits" or "license" for more information.
 >>> ^Z
 D:\>chcp 65001
 Active code page: 65001
 D:\>python
 Fatal Python error: Py_Initialize: can't initialize sys standard
 streams
 LookupError: unknown encoding: cp65001
 This application has requested the Runtime to terminate it in an 
 unusual way.
 Please contact the application's support team for more information.
 D:\>
This is a bit surprising since Code Page 65001 IS the official Microsoft UTF-8 Code Page.
Please see...
 http://msdn.microsoft.com/en-us/library/dd317756%28v=vs.85%29.aspx 
msg141082 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011年07月25日 09:54
You can use PYTHONIOENCODING="utf-8". Code page 65001 is not exactly like Python UTF-8 codec: see issue #6058.
Using issue #12281, it may be possible to implement a cp65001 codec.
See also issue #1602 for the Windows console.
Why do you use cp65001?
msg141087 - (view) Author: Bruce Ferris (bferris57) Date: 2011年07月25日 11:21
I use code page 65001 because 1) it displays the UTF-8 characters in my text files with "echo <filename>" on the command line, and 2) that's Microsoft's "official" (whatever that means) code page for UTF-8, and 3) it works in cmd.exe.
Setting aside why I use it, it IS used by some, and Python shouldn't GPF for ANY reason if it can be easily fixed. Right?
Essentially, 65001 makes Microsoft's console output behave properly (at least with the limited characters in Lucinda Console) so I would think Python should consider not blowing up when it's set. 
To be honest, I just happened to have it set to 65001 to get the output from another program to look right and just happened to run Python to do some quick unrelated calculations.
Imagine my surprise when Python blew, especially when all I did was to run it. It's not like I asked it to do any UTF-8 or anthing!
Anyway, as far as I understand... Any GPF is a potential back door. So, it needs closing.
msg141090 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2011年07月25日 12:37
In this case it is not a potential security hole, since in fact the "GPF" comes from Python explicitly calling Abort because of a situation it can't handle, as indicated by the error message from Python. (If it were a true segfault-like error, there would be no message from Python itself.)
msg141091 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011年07月25日 13:19
> Python shouldn't GPF for ANY reason if it can be easily fixed
"Code page 65001" issue cannot be "easily" fixed. Did you read the history of the issue #6058? It took one year and a half to decide that cp65001 cannot be set as an alias to UTF-8.
As I wrote, it will be possible to really implement a real cp65001 codec for Python using issue #12281.
msg141108 - (view) Author: Bruce Ferris (bferris57) Date: 2011年07月25日 17:58
I disagree with the "it's not really a GPF since it calls Abort".
Consider the following cmd.exe session...
 Microsoft Windows [Version 6.0.6002]
 Copyright (c) 2006 Microsoft Corporation. All rights reserved.
 D:\>chcp 65001
 Active code page: 65001
 D:\>python >t.txt
 Fatal Python error: Py_Initialize: can't initialize sys standard streams
 LookupError: unknown encoding: cp65001
 This application has requested the Runtime to terminate it in an unusual way.
 Please contact the application's support team for more information.
 D:\>type t.txt
 D:\>dir t.txt
 Volume in drive D is DATA
 Volume Serial Number is 2E61-626C
 Directory of D:\
 25/07/2011 06:10 PM 0 t.txt
 1 File(s) 0 bytes
 0 Dir(s) 16,768,655,360 bytes free
 D:\>
This means that, even if it was "intentional", from a programatic point of view. the Python process in this case leaves no other indication other than transient bytes in the transient cmd.exe console buffer. No way of redirecting the output and examining it.
I strongly disagree with the statement "(If it were a true segfault-like error, there would be no message from Python itself.)"
The "no message from Python itself" case is shown above.
My application handles code page 65001 just fine, no problems. If it attempts to use Windows WriteConsole function and it fails, it tries using WriteFile instead. So, when my application fails and output is redirected, it produces output.
But, Python 3.1 doesn't. See the following Microsoft MSDN link, it states the WriteConsole point explicitly...
 http://msdn.microsoft.com/en-us/library/ms687401%28v=VS.85%29.aspx
So, if Python doesn't like Code Page 65001, for whatever reason, it can simply save it on startup, and change it to whatever makes it happy. Then, upon Python exit (including Abort), change it back to 65001 before calling Abort.
I'm sorry, but the following is "easy" in my book...
 1) At Startup... Call GetConsoleOutputCP and save that somewhere.
 If code page is 65001, change it to something that
 doesn't cause problems by calling SetConsoleOutputCP
 2) On Write... If WriteConsole fails, try calling WriteFile instead.
 3) At Abort or Exit... Call SetConsoleOutputCP to set it back
 to whatever it was on Startup.
I don't care if your app (Python) can display UTF-8 on Microsoft's cmd.exe console or if it can't. 
All I'm trying to do is point out a bit of misbehaviour that CAN be easily changed and will make your product more resilient.
I don't know the details of how Python deals with character encoding and, quite honestly, I shouldn't need to since it's not my product. however, I DO know how I handle a similiar scenario in my own app.
Microsoft made it complicated, not me. But, I can "easily" get around the problem using the above scenario. If Python can't do it just as "easily", then it tells me more about Python's implementation and the people behind Python then it tells me about Microsoft and the people behind Windows.
Don't get me wrong, I love Python as a tool for solving certain classes of problems and, please, keep up the good work. It's appreciated.
msg141111 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2011年07月25日 18:50
If you read what I wrote, I did not say that it wasn't a GPF. I said that an Abort is different from writing into or reading from memory incorrectly (which is what leads to security holes).
We don't have many Windows developers active enough to have gotten commit privileges, but perhaps one of them will be have time enough to take a look at your suggestion.
msg141140 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2011年07月26日 10:26
First, a call to abort() is not a GPF: it's not an interrupt from the kernel or the OS, it's just an explicit (albeit brutal) way to exit from an application. There is no potential back door here.
Then, the "Fatal Python error:" line is written to stderr. It's possible to redirect it (try with "python 2>t.txt").
The message "This application..." is written by the Microsoft C Runtime Library, I don't know if it is also printed to stderr.
Furthermore, in this case the application will have a particular exit code, IIRC it's 3; from the cmd.exe you can get it with "echo %ERRORLEVEL%". Normally python processes exit with a status of 0 (everything is OK) or 1 (if an exception is raised and not caught)
Finally, the "fix" you suggest would be applicable if python used WriteConsole or WriteFile... but it does not! It uses the write() function, which probably calls WriteConsole or WriteFile at some point, but does not take unicode characters...
msg141141 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011年07月26日 10:28
> Finally, the "fix" you suggest would be applicable if python
> used WriteConsole or WriteFile... but it does not! It uses
> the write() function, which probably calls WriteConsole 
> or WriteFile at some point, but does not take unicode characters...
The issue #1602 discuss how to change Python to use WriteConsole.
msg141158 - (view) Author: Bruce Ferris (bferris57) Date: 2011年07月26日 13:53
Victor, thanks for replying and I've had a quick read of everything that went on for issue #1602. I think there's some misunderstanding in what I'm saying here. Maybe this will help clear up what I'm saying...
 D:\>chcp
 Active code page: 850
 D:\>chcp 65001
 Active code page: 65001
 D:\>python27\python
 Python 2.7 (r27:82525, Jul 4 2010, 09:01:59) [MSC v.1500 32 bit 
 (Intel)] on win32
 Type "help", "copyright", "credits" or "license" for more information.
 >>> ^Z
 D:\>python31\python
 Fatal Python error: Py_Initialize: can't initialize sys standard
 streams
 LookupError: unknown encoding: cp65001
 This application has requested the Runtime to terminate it in an
 unusual way.
 Please contact the application's support team for more information.
 D:\>chcp 850
 Active code page: 850
 D:\>python31\python
 Python 3.1.2 (r312:79149, Mar 21 2010, 00:41:52) [MSC v.1500 32 bit
 (Intel)] on win32
 Type "help", "copyright", "credits" or "license" for more information.
 >>> ^Z
 D:\>
You see, I'm NOT trying to output any Unicode or UTF-8 characters. All I'm trying to do is run different versions of Python on the same machine from the command line.
Some code inside Python now "break" if Python 3.1 is started with Code Page 65001.
I fully understand the change between Python 2.7 and 3.1 were probably due to trying to fix issue #1602 (or some other related issue).
But, as a side-effect to that "fix", if you now start Python 3.1 (and maybe beyond) with code page set to 65001, it refuses to work but it didn't used to refuse to work.
Evidently, Python now tries using the Code Page as an encoding lookup. But, it didn't used to in 2.7. So, there's another compatability issue introduced.
Setting my cmd.exe code page to 65001 shouldn't mean a thing to Python if it can't associate it with an encoding. It could, at least, just switch to 7-Bit ASCII and proceed on. That would be better than failing!
That's my whole point. If Python want to do some tweeking with code pages to get it's job done, that's fine by me, as long as it doesn't "break" and restores whatever code page I had set when I started it.
It's not down to a UTF-8 issue, it's about a compatability issue introduced sometime in the last year or so as a side-effect of trying to resolve a UTF-8 issue, probably #1602.
That's all!
msg141161 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011年07月26日 14:32
> All I'm trying to do is run different versions of Python on the same machine from the command line.
> Some code inside Python now "break" if Python 3.1 is started with Code Page 65001.
Yes, this issue can be seen as a regression introduced in Python 3.
> I fully understand the change between Python 2.7 and 3.1 were probably due to trying to fix issue #1602 (or some other related issue).
Python 2 and 3 are very different. In Python 2, print "abc" writes a 
byte string to stdout, whereas print("abc") writes a Unicode string to 
stdout. Byte strings and character strings are two different things ;-)
Python 3 now uses Unicode by default and it requires a codec to encode 
strings to stdout. If your program don't output anything to stdout, use 
pythonw.exe instead of python.exe.
The issue #1602 is not specific to Python 3: Python 2 is unable to 
display correctly Unicode strings in the Windows console. It's less 
important in Python 2, because most developers use the default string 
type which is a byte string.
> Setting my cmd.exe code page to 65001 shouldn't mean a thing to Python if it can't associate it with an encoding. It could, at least, just switch to 7-Bit ASCII and proceed on. That would be better than failing!
I don't like this idea. In Python, we try to not ignore errors, but try 
instead to fix them or at least fail with an error message (the user is 
responsible to fix it or use a workaround). To fix this issue, we have 
to implement a cp65001 codec for Python or to work directly in Unicode 
using WriteConsole.
If you cannot help to implement one of this option, you can use a 
workaround:
 - don't change the codepage
 - use PYTHONIOENCODING=utf-8
msg144694 - (view) Author: Bruce Ferris (bferris57) Date: 2011年09月30日 16:10
The PYTHONIOENCODING=utf-8 setting works great if I have code page 65001 set. I haven't, however, done a complete console functionality check with that setting but, thanks for the input -- it solves the current problem I'm experiencing.
I do wonder, however, if switching to that setting should happen automatically if it's not specified and the Windows current code page is 65001.
That would solve the problem unless, of course, PYTHONIOENCODING has side-effects elsewhere that would cause other problems.
On the other hand, if it does have side-effects elsewhere than it's not the answer I'm looking for.
msg159339 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2012年04月25日 22:48
> LookupError: unknown encoding: cp65001
The initial issue was solved by the issue #13216.
For other issues with the Windows Console, see the issue #1602.
History
Date User Action Args
2022年04月11日 14:57:20adminsetgithub: 56841
2012年04月25日 22:48:28vstinnersetstatus: open -> closed
resolution: fixed
messages: + msg159339
2011年09月30日 16:10:28bferris57setmessages: + msg144694
2011年09月29日 20:13:29vstinnersettitle: Windows GPF with Code Page 65001 -> Python 3 doesn't support cp65001 as the OEM code page
2011年07月26日 14:32:24vstinnersetmessages: + msg141161
2011年07月26日 13:53:30bferris57setmessages: + msg141158
2011年07月26日 10:28:43vstinnersetmessages: + msg141141
2011年07月26日 10:26:28amaury.forgeotdarcsetnosy: + amaury.forgeotdarc
messages: + msg141140
2011年07月25日 18:50:32r.david.murraysetmessages: + msg141111
2011年07月25日 17:58:19bferris57setmessages: + msg141108
2011年07月25日 13:19:49vstinnersetmessages: + msg141091
2011年07月25日 12:37:47r.david.murraysetnosy: + r.david.murray
messages: + msg141090
2011年07月25日 11:21:51bferris57setmessages: + msg141087
2011年07月25日 09:54:38vstinnersetmessages: + msg141082
2011年07月25日 02:33:18ezio.melottisetnosy: + vstinner, ezio.melotti
2011年07月25日 02:25:43bferris57settype: crash
2011年07月25日 02:24:49bferris57create

AltStyle によって変換されたページ (->オリジナル) /