5

Using python 3.2 in Windows 7 I am getting the following in IDLE:

>>compile('pass', r'c:\temp\工具\module1.py', 'exec')
UnicodeEncodeError: 'mbcs' codec can't encode characters in position 0--1: invalid character

Can anybody explain why the compile statement tries to convert the unicode filename using mbcs? I know that sys.getfilesystemencoding returns 'mbcs' in Windows, but I thought that this is not used when unicode file names are provided.

for example:

f = open(r'c:\temp\工具\module1.py') 

works.

For a more complete test save the following in a utf8 encoded file and run it using the standard python.exe version 3.2

# -*- coding: utf8 -*-
fname = r'c:\temp\工具\module1.py'
# I do have the a file named fname but you can comment out the following two lines
f = open(fname)
print('ok')
cmp = compile('pass', fname, 'exec')
print(cmp)

Output:

ok
Traceback (most recent call last):
 File "module8.py", line 6, in <module>
 cmp = compile('pass', fname, 'exec')
UnicodeEncodeError: 'mbcs' codec can't encode characters in position 0--1: inval
id character
Lennart Regebro
173k45 gold badges230 silver badges254 bronze badges
asked Jan 10, 2012 at 4:31
9
  • tried locally in XP and get a proper code object back. Is this being run from the CLI or is this run via a file? Commented Jan 10, 2012 at 5:56
  • I'm going to guess that it's not the call signature that's the problem, but the content of the file that is causing the unicode error. check to make sure that "module1.py" is correctly encoded, with the encoding signature assigned. Commented Jan 10, 2012 at 6:24
  • @monkut: In Python 3.x, you don't have to worry about encoding - if there are UTF-8 characters in the file, then they'll be rendered as UTF-8 characters. Commented Jan 10, 2012 at 6:26
  • hmmmm... still seems like an encoding issue with "module1.py". Perhaps the sig is set to "mbcs" overriding the default? Commented Jan 10, 2012 at 6:38
  • 2
    The compile function converts the filename argument to bytes using the filesystem encoding: hg.python.org/cpython/file/4f8c24830a5c/Python/… . I suspect it shouldn't be doing this. Commented Jan 10, 2012 at 13:25

3 Answers 3

5

From Python issue 10114, it seems that the logic is that all filenames used by Python should be valid for the platform where they are used. It is encoded using the filesystem encoding to be used in the C internals of Python.

I agree that it probably shouldn't throw an error on Windows, because any Unicode filename is valid. You may wish to file a bug report with Python for this. But be aware that the necessary changes might not be trivial, because any C code using the filename has to have something to do if it can't be encoded.

answered Jan 10, 2012 at 13:52
Sign up to request clarification or add additional context in comments.

8 Comments

A related question is why on the latest version of Windows the file system encoding should still be mbcs.
@PyScripter: Should it be something else?
It should be UTF-16 at least in the modern versions of Windows
@PyScripter: I'm not sure about that. Windows has unicode APIs which expect UTF-16 arguments, but the filesystem encoding is for use with bytes-oriented APIs, and I'm pretty sure those expect 8-bit strings, not UTF-16.
Python uses the unicode (UTF-16) API for communication with the file system, but it uses mbcs for checking the validity of file names. This leads to the problem of failing to compile perfectly valid file names as demonstrated here.
|
1

Here a solution that worked for me: Issue 427: UnicodeEncodeError: 'ascii' codec can't encode characters in position 1-6: ordinal not in range (128):

If you look the PyScripter help file in the topic "Encoded Python Source Files" (last paragraph) it tells you how to configure Python to support other encodings by modifying the site.py file. This file is in the lib subdirectory of the Python installation directory. Find the function setencoding and make sure that the support locale aware default string encodings is on. (see below)

def setencoding():
 """Set the string encoding used by the Unicode implementation. The
 default is 'ascii', but if you're willing to experiment, you can
 change this."""
 encoding = "ascii" # Default value set by _PyUnicode_Init()
 if 0: <<<--- set this to 1 ---------------------------------
 # Enable to support locale aware default string encodings.
 import locale
 loc = locale.getdefaultlocale ()
 if loc[1]:
 encoding = loc[1]
 if 0:
 # Enable to switch off string to Unicode coercion and implicit
 # Unicode to string conversion.
 encoding = "undefined"
 if encoding != "ascii":
 # On Non-Unicode builds this will raise an AttributeError...
 sys.setdefaultencoding (encoding) # Needs Python Unicode
build !
answered Nov 13, 2012 at 14:27

Comments

0

I think you could try to change the "\" in the path of file into "/",just like

compile('pass', r'c:\temp\工具\module1.py', 'exec')

compile('pass', r'c:/temp/工具/module1.py', 'exec')

I have met a problem just like you, I used this method to solve the problem. I hope it can work with yours.

answered Jul 4, 2017 at 2:24

Comments

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.