Re: Issues: Character 160 - Non-breaking space + Additional Issue with UTF-8
[
Date Prev][
Date Next][
Thread Prev][
Thread Next]
[
Date Index]
[
Thread Index]
- Subject: Re: Issues: Character 160 - Non-breaking space + Additional Issue with UTF-8
- From: Lorenzo Donati <lorenzodonatibz@...>
- Date: 2018年7月10日 15:30:13 +0200
On 09/07/2018 22:06, Sean Conner wrote:
It was thus said that the Great Albert Chan once stated:
BTW, non-breaking "fake" space in filename is a bad idea.
http://boston.conman.org/2018/02/28.2
What's bad about it?
-spc (Or are you in the "no spaces in filename" camp?)
Well, it's a nice and smart trick, but I'm in the camp of "If I see a
space, I want to know it's a space (0x20)".
Moreover it is not ASCII, and I tend to avoid non-ASCII names. I work
mainly on Windows, which doesn't support UTF-8 natively the way Linux
does. So handling characters outside the ASCII set may be a nightmare.
BTW, try handling such a file to someone (especially a non-programmer)
who is unaware and see the puzzlement in his eyes when he cannot
understand why he cannot delete "my invoice.pdf" using the command line
or why he sees "my invoice.pdf" and "my invoice.pdf" in the same
directory!!!
If you want to be very evil, put /two/ spaces between words, where the
first is ASCII 0x20 and the second is char 160!
Moreover, I've the gut feeling that there are plenty of badly-written
Windows programs/scripts that will choke on char 160 when some code-page
is set differently than the way the programmer has assumed.
Yes, underscore is not nice to see, but at least I see exactly what
character is that (well, in ASCII at least. I'm sure there is some
obscure UNICODE code point that is almost identical to an underscore and
that will appear identical in some font!)
Unicode is great for typesetting (I use regularly LaTeX and it's fun to
find almost every symbol you may imagine, even ancient German runic
scripts!), but it sucks (IMHO) for general programming or
computer-related stuff. Too much mind overhead to use correctly for
little gain.
<disclaimer>
I know my view is a bit "western-centric" (or "latin-centrinc") and
people speaking languages who need thousands of symbols to be written
might think differently (especially Asian languages).
Anyway I'm curious to know how, say, Chinese programmers view the thing.
Would they find coding more "easy" if they could write programs using
ideograms or do they think using transliteration of their words in a
Latin alphabet.
BTW, I code religiously "in English" (even comments), and I teach my
students to try to do so, but I understand that sometimes this requires
higher English skills than many programmers have.
</disclaimer>
On a related note: sometimes I've dreamt of having an universally
*standardized* "extended ASCII" charset for programming, without all the
human-language-related stuff of unicode. A 16 bit "universal" charset
should be big enough to accomodate any symbol useful in programming
(e.g. common math operators and symbols, greek letters, currency
symbols), but I'm digressing. :-)
Cheers!
-- Lorenzo