Cygwin programs doesn't support non-ASCII filenames

Lenik lenik@bodz.net
Sat May 9 15:12:00 GMT 2009


(This mail is encoded in utf-8)
On 2009年5月9日 18:02, Corinna Vinschen wrote:
> [Repeated and additional question. I accidentally sent this as PM.
> Sorry about that. Let's keep this on the list, please]
>> On May 9 11:43, Lenik wrote:
>> (My system locale is zh_CN)
>> What ANSI codepage is that?
>> And what OEM codepage uses the console Window by default?
`chcp' shows codepage is 937
I don't know what's difference between ANSI codepage and OEM codepage.
>>> 1, test path
>> >>> set LANG=& cygpath -am .
>> C:/Profiles/Shecti/??????
>>>> >>> set LANG=zh_CN.GBK& cygpath -am .
>> C:/Profiles/Shecti/??????
>>>> >>> set LANG=C& cygpath -am .
>> C:/Profiles/Shecti/×ÀÃæ
>> Can you please give us the exact name of the directory in either
> UTF-8 or UTF-16 notation?
The two chinese characters encoding in:
GB2312: d7 c0 c3 e6
UTF-8: e6 a1 8c e9 9d a2
Unicode: \u684c \u9762
>>> 2, the `test' utility
>> >>> set LANG=& bash -c "D=$(cygpath -am .); if [ -d $D ]; then echo
>> ok $D; else echo fail $D; fi"
>> fail C:/Profiles/Shecti/??????
>> What you're actually testing here all the time is cygpath in the first
> place. If you stop using cygpath, start a bash shell and use the Cygwin
> commands with the paths in POSIX notation, you would have much less
> trouble. Cygwin is a POSIX emulation layer, after all.
>Well, I test the pathnames using cygpath because I want to get absolute 
path so the chinese characters will be included in this test, and I 
can't type these characters in the console window. The second reason is, 
I associated .sh file type with bash, as:
 .sh=C:\lam\sys\cygwin-1.7\bin\bash -c "$(cygpath -u '%0') %*"
This is a new test don't use cygpath:
 C:\Profiles\Shecti> set LANG=& bash -c "cat 你好"
 cat: 你好: No such file or directory
 C:\Profiles\Shecti> set LANG=zh_CN.GB2312& bash -c "cat 你好"
 cat: 你好: No such file or directory
 C:\Profiles\Shecti> set LANG=zh_CN.GBK& bash -c "cat 你好"
 123
 C:\Profiles\Shecti> set LANG=zh_CN.UTF-8& bash -c "cat 你好"
 123
 C:\Profiles\Shecti> set LANG=& bash -c "d 你好"
 /mnt/c/Profiles/Shecti/你好 doesn't exist!
 C:\Profiles\Shecti> set LANG=zh_CN.GBK& bash -c "d 你好"
 /mnt/c/Profiles/Shecti/你好 doesn't exist!
 C:\Profiles\Shecti> set LANG=zh_CN.UTF-8& bash -c "d 你好"
 /mnt/c/Profiles/Shecti/你好 doesn't exist!
The same result, it shows that `cat' from binutils can support locale 
well, while `d' isn't.
> If you give me the above information I'll look into fixing cygpath.
>>> The GB2312 charset is a subset of GBK charset, and the characters `
>> ??????' is included in GB2312 charset. So in this example, GB2312 SHOULD
>> WORK.
>> Sorry, no. It's documented that GBK is supported, GB2312 isn't. From
> what I read about GB2312 it's not actually a subset of GBK in terms
> of character definitions, it's just a subset in terms of supported
> characters. AFAICS, GB2312 uses chars< 0x7f in multibyte sequences
> which is not feasible for Cygwin. We could support EUC-CN, which
> seems to be another way to encode GB2312 chars, but I'm not exactly
> willing to add that now. I'd rather stabilize what we have now and
> add further charset support in a later, official 1.7 release.
>> So you can use LANG=zh_CN.GBK, but not LANG=zh_CN.GB2312. It's just
> treated as invalid input. Better: Use LANG=zh_CN.UTF-8.
>Yes, GB2312 is a subset in terms of supported characters. Is there 
anyway to know the default locale of current cygwin installation? From 
the test I found that `unset LANG' and `set LANG=zh_CN.GB2312' just get 
the same results, so I thought that GB2312 is the default locale.
And, I'd like to use UTF-8 too, but I won't chcp to 65001, this will 
introduce a lot of new problems when deploy to customers' machines. 
while most programs and files are encoded in GB2312 in the real world.
Lenik
--
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Problem reports: http://cygwin.com/problems.html
Documentation: http://cygwin.com/docs.html
FAQ: http://cygwin.com/faq/


More information about the Cygwin mailing list

AltStyle によって変換されたページ (->オリジナル) /