Removing ^X in paths

Corinna Vinschen corinna-cygwin@cygwin.com
Thu Feb 3 08:53:01 GMT 2022


On Feb 2 21:12, Dennis Heimbigner wrote:
> I am using 64bit.
> And it has nothing to do misreading characters.
>> The ^X is described in this document:
> https://www.cygwin.com/cygwin-ug-net/using-specialnames.html,
>> There you will see this text:
>> "If you don't want or can't use UTF-8 as character set for
> whatever reason, you will nevertheless be able to access the
> file. How does that work? When Cygwin converts the filename from
> UTF-16 to your character set, it recognizes characters which
> can't be converted. If that occurs, Cygwin replaces the
> non-convertible character with a special character sequence. The
> sequence starts with an ASCII CAN character (hex code 0x18,
> equivalent Control-X), followed by the UTF-8 representation of
> the character. The result is a filename containing some ugly
> looking characters. While it doesn't look nice, it is nice,
> because Cygwin knows how to convert this filename back to
> UTF-16. The filename will be converted using your usual
> character set. However, when Cygwin recognizes an ASCII CAN
> character, it skips over the ASCII CAN and handles the following
> bytes as a UTF-8 character. Thus, the filename is symmetrically
> converted back to UTF-16 and you can access the file."
>> There is no obvious good reason to continue this convention.

You're probably using a non-UTF-8 locale, e. g., LANG=en_US using
ISO-8859-1 as charset. See the output of `locale -av' to learn what
charset your locale uses. Either way, converting the UTF-16 filenames
to a non-UTF charset is not lossless. That's what the ASCII CAN stuff
is for. If you want to avoid that, use a UTF-8 locale, e.g.
en_US.UTF-8.
Corinna


More information about the Cygwin mailing list

AltStyle によって変換されたページ (->オリジナル) /