This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2008年02月16日 16:27 by giovannibajo, last changed 2022年04月11日 14:56 by admin. This issue is now closed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| argv_unicode.patch | giovannibajo, 2008年02月17日 18:57 | |||
| wchar.diff | loewis, 2008年03月10日 14:40 | |||
| Messages (15) | |||
|---|---|---|---|
| msg62458 - (view) | Author: Giovanni Bajo (giovannibajo) | Date: 2008年02月16日 16:27 | |
Under Windows, sys.argv is created through the Windows ANSI API. When you have a file/directory which can't be represented in the system encoding (eg: a japanese-named file or directory on a Western Windows), Windows will encode the filename to the system encoding using what we call the "replace" policy, and thus sys.argv[] will contain an entry like "c:\\foo\\??????????????.dat". My suggestion is that: * At the Python level, we still expose a single sys.argv[], which will contain unicode strings. I think this exactly matches what Py3k does now. * At the C level, I believe it involves using GetCommandLineW() and CommandLineToArgvW() in WinMain.c, but should Py_Main/PySys_SetArgv() be changed to also accept wchar_t** arguments? Or is it better to allow for NULL to be passed (under Windows at least), so that the Windows code-path in there can use GetCommandLineW()/CommandLineToArgvW() to get the current process' arguments? |
|||
| msg62460 - (view) | Author: Christian Heimes (christian.heimes) * (Python committer) | Date: 2008年02月16日 16:54 | |
The issue is related to #1342 Since we have dropped support for older versions of Windows (9x, ME, NT4) I like to get the Python interface to argv, env and files fixed. |
|||
| msg62499 - (view) | Author: Giovanni Bajo (giovannibajo) | Date: 2008年02月17日 18:57 | |
I'm attaching a simple patch that seems to work under Py3k. The trick is that Py3k already attempts (not sure how or why) to decode argv using utf-8. So it's sufficient to setup argv as UTF8-encoded strings. Notice that brings the output of "python ààààà" from this: Fatal Python error: no mem for sys.argv UnicodeDecodeError: 'utf8' codec can't decode bytes in position 0-2: invalid data to this: TypeError: zipimporter() argument 1 must be string without null bytes, not str which is expected since zipimporter_init() doesn't even know to ignore unicode strings (let alone handle them correctly...). |
|||
| msg62659 - (view) | Author: Martin v. Löwis (loewis) * (Python committer) | Date: 2008年02月21日 20:50 | |
I dislike the double decoding, and would prefer if sys.argv would be created directly from the wide command line. In addition, I think the patch is incorrect: it ignores the arguments to Py_Main, which is a documented API function. One solution might be to declare all these functions (Py_Main, SetProgramName, GetArgcArgv) to operate on Py_UNICODE*, and then convert the POSIX callers of Py_Main to use mbstowcs when going from the command line to Py_Main. WinMain could then become recompiled for Unicode directly, likewise Modules/python.c |
|||
| msg62660 - (view) | Author: Giovanni Bajo (giovannibajo) | Date: 2008年02月21日 21:33 | |
mbstowcs uses LC_CTYPE. Is that correct and consistent with the way default encoding under UNIX is handled by Py3k? Would a Py_MainW or similar wrapper be easier on the UNIX guys? I'm just asking, I don't have a definite idea. |
|||
| msg62664 - (view) | Author: Martin v. Löwis (loewis) * (Python committer) | Date: 2008年02月21日 22:01 | |
> mbstowcs uses LC_CTYPE. Is that correct and consistent with the way > default encoding under UNIX is handled by Py3k? It's correct, but it's not consistent with the default encoding - there isn't really any default encoding in Py3k. More specifically, PyUnicode_FromString uses UTF-8, but not as a (changeable) default, but as part of its API specification. Command line arguments are in the locale's charset, so the LC_CTYPE must be used to convert them. > Would a Py_MainW or similar wrapper be easier on the UNIX guys? I'm just > asking, I don't have a definite idea. See above. The current POSIX implementation is incorrect also. It should use the locale's encoding, but doesn't. |
|||
| msg63443 - (view) | Author: Martin v. Löwis (loewis) * (Python committer) | Date: 2008年03月10日 14:40 | |
Here is a patch that redoes the entire argv handling, in terms of wchar_t. As a side effect, it also changes the sys.path handling to use wchar_t. |
|||
| msg65005 - (view) | Author: Martin v. Löwis (loewis) * (Python committer) | Date: 2008年04月05日 20:42 | |
This is now fixed in r62178 for Py3k. For 2.6, I don't think fixing it is feasible. |
|||
| msg65045 - (view) | Author: Benjamin Peterson (benjamin.peterson) * (Python committer) | Date: 2008年04月06日 16:50 | |
MvL's recent commit creates compiler warnings for Unicode UCS4 for the same reason as #2388. |
|||
| msg65061 - (view) | Author: Martin v. Löwis (loewis) * (Python committer) | Date: 2008年04月07日 03:27 | |
What warnings precisely are you seeing? I didn't see anything in the 3k branch (not even for #2388, as PyErr_Format doesn't have the GCC format attribute in 3k, unlike 2.x). |
|||
| msg65073 - (view) | Author: Benjamin Peterson (benjamin.peterson) * (Python committer) | Date: 2008年04月07日 11:54 | |
Martin, you are right that they are not from the same reason as that issue. gcc -c -arch ppc -arch i386 -isysroot /Developer/SDKs/MacOSX10.4u.sdk/ -fno-strict-aliasing -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -I. -IInclude -I./Include -DPy_BUILD_CORE -o Modules/main.o Modules/main.c Modules/main.c: In function 'Py_Main': Modules/main.c:478: warning: passing argument 1 of 'Py_SetProgramName' from incompatible pointer type Modules/main.c: In function 'Py_Main': Modules/main.c:478: warning: passing argument 1 of 'Py_SetProgramName' from incompatible pointer type |
|||
| msg125827 - (view) | Author: David-Sarah Hopwood (davidsarah) | Date: 2011年01月09日 07:36 | |
The following code is being used to work around this issue for Python 2.x in Tahoe-LAFS: # This works around <http://bugs.python.org/issue2128>. GetCommandLineW = WINFUNCTYPE(LPWSTR)(("GetCommandLineW", windll.kernel32)) CommandLineToArgvW = WINFUNCTYPE(POINTER(LPWSTR), LPCWSTR, POINTER(c_int)) \ (("CommandLineToArgvW", windll.shell32)) argc = c_int(0) argv_unicode = CommandLineToArgvW(GetCommandLineW(), byref(argc)) argv = [argv_unicode[i].encode('utf-8') for i in range(0, argc.value)] if not hasattr(sys, 'frozen'): # If this is an executable produced by py2exe or bbfreeze, then it will # have been invoked directly. Otherwise, unicode_argv[0] is the Python # interpreter, so skip that. argv = argv[1:] # Also skip option arguments to the Python interpreter. while len(argv) > 0: arg = argv[0] if not arg.startswith("-") or arg == "-": break argv = argv[1:] if arg == '-m': # sys.argv[0] should really be the absolute path of the module source, # but never mind break if arg == '-c': argv[0] = '-c' break |
|||
| msg125829 - (view) | Author: David-Sarah Hopwood (davidsarah) | Date: 2011年01月09日 07:39 | |
Sorry, missed out the imports: from ctypes import WINFUNCTYPE, windll, POINTER, byref, c_int from ctypes.wintypes import LPWSTR, LPCWSTR |
|||
| msg179892 - (view) | Author: Michael Herrmann (mherrmann.at) | Date: 2013年01月13日 20:23 | |
Hi, is it correct that this bug no longer appears in Python 2.7.3? I checked the changelogs of 2.7, but couldn't find anything. Thanks! Michael |
|||
| msg179928 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2013年01月14日 09:15 | |
> is it correct that this bug no longer appears in Python 2.7.3? Martin wrote that it cannot be fixed in Python 2: "For 2.6, I don't think fixing it is feasible." The "fix" is to upgrade your application to Python 3. |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:56:30 | admin | set | github: 46381 |
| 2013年01月14日 09:15:31 | vstinner | set | messages: + msg179928 |
| 2013年01月13日 20:23:17 | mherrmann.at | set | nosy:
+ mherrmann.at messages: + msg179892 |
| 2011年01月14日 22:18:04 | vstinner | set | nosy:
+ vstinner |
| 2011年01月09日 07:39:42 | davidsarah | set | nosy:
loewis, christian.heimes, giovannibajo, benjamin.peterson, davidsarah messages: + msg125829 |
| 2011年01月09日 07:36:51 | davidsarah | set | nosy:
+ davidsarah messages: + msg125827 versions: + Python 2.6, Python 2.5, Python 2.7 |
| 2008年04月07日 11:54:38 | benjamin.peterson | set | messages: + msg65073 |
| 2008年04月07日 03:27:27 | loewis | set | messages: + msg65061 |
| 2008年04月06日 16:50:36 | benjamin.peterson | set | nosy:
+ benjamin.peterson messages: + msg65045 |
| 2008年04月05日 20:42:42 | loewis | set | status: open -> closed messages: + msg65005 resolution: fixed versions: - Python 2.6 |
| 2008年03月10日 14:40:50 | loewis | set | files:
+ wchar.diff keywords: + patch messages: + msg63443 |
| 2008年02月21日 22:01:33 | loewis | set | messages: + msg62664 |
| 2008年02月21日 21:33:17 | giovannibajo | set | messages: + msg62660 |
| 2008年02月21日 20:50:58 | loewis | set | nosy:
+ loewis messages: + msg62659 |
| 2008年02月17日 18:58:00 | giovannibajo | set | files:
+ argv_unicode.patch messages: + msg62499 |
| 2008年02月16日 16:54:06 | christian.heimes | set | priority: high nosy: + christian.heimes messages: + msg62460 components: + Windows versions: + Python 2.6 |
| 2008年02月16日 16:27:45 | giovannibajo | create | |