lua-users home
lua-l archive

Re: Suggestion: handle utf-8 filename in windows

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


On 26 October 2017 at 07:23, Egor Skriptunoff
<egor.skriptunoff@gmail.com> wrote:
> But maybe someday Lua would become UTF-8-only language...
> In this case all Lua functions (including string-, patterns-, file- and
> scriptload- operations) should work only with UTF-8 encoded strings.
>
> This means:
> 1) #str would return number of unicode symbols in UTF-8-encoded string
Why would this ever be useful?
 - Usually you need to know number of bytes for storage/network
transmission/etc.
 - Sometimes you need to know how many characters
 - but often in unicode a character is multiple code points
 - knowing number of codepoints is useless
 - Sometimes you need to know the width of a character on the screen
 - this also has it's own algorithm (even for fixed-width fonts)
> 3) file functions would expect file names being encoded in UTF-8
No file system I know of enforces this.
 - On linux paths can contain any bytes except the null byte.
 - On windows you can have paths with invalid utf16 (it allows
surrogate halves)
File names should be treated as arbitrary sequences of bytes.

AltStyle によって変換されたページ (->オリジナル) /