lua-users home
lua-l archive

Re: Suggestion: handle utf-8 filename in windows

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]



On Wed, Oct 25, 2017 at 6:13 PM, Pierre Chapuis wrote:

I second that. I suspect most of those of us who use Lua on Windows either patch their IO library or maintain an alternative version of it because of that. LFS has the same issue, by the way.


On Wed, Oct 25, 2017, at 17:00, 云风 Cloud Wu wrote:
lua use fopen() in its base lib now, there are too troubles to support windows file system because of it :(

I suggest lua use unicode version api of windows in future version, such as fopen -> _wfopen , and call MultiByteToWideChar to convert utf-8 filename to wchar_t first.

I think let following apis support utf-8 filename in windows would be better:
  loadfile
  dofile
  require
  io.open
  os.remove
  os.rename
  os.execute
  os.getenv


It seems that people from *nix world are accustomed to there is only one "main" codepage in the whole world ;-)
That's not true in general.
On Windows there are two "main" codepages: ANSI (locale-dependent) and UTF-16 (locale-independent).
Working in Lua with ANSI codepage is a natural choice (that's really useful).
I don't want Lua for Windows to stop working with ANSI codepage.

Lua is portable, so it has to work with "OS-native" codepage instead of forcing the same codepage on every OS.
UTF-8 is not part of ANSI C.
UTF-8 is not even fully supported by Lua on Linux :-)

All the OP needs is "winapi"-like module.

But maybe someday Lua would become UTF-8-only language...
In this case all Lua functions (including string-, patterns-, file- and scriptload- operations) should work only with UTF-8 encoded strings.

This means:
1) #str would return number of unicode symbols in UTF-8-encoded string
2) dot-pattern would match whole unicode symbol instead of single byte
3) file functions would expect file names being encoded in UTF-8
4) Lua source texts would be required to be encoded only in UTF-8 (how about arbitrary unicode symbols in identifiers?)

Such UTF-8-only language would have some disadvantages:
5) All operations listed above would fail or generate exception on invalid UTF-8 strings.
6) There would be no byte strings in Lua, but many users want something like "mutable byte arrays".


AltStyle によって変換されたページ (->オリジナル) /