lua-users home
lua-l archive

Re: Issues: Character 160 - Non-breaking space + Additional Issue with UTF-8

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


On 2018年07月10日 06:20 PM, Gregg Reynolds wrote:
You point being?
I mean, it's a joke, really, but if I were to actually redesign unicode, I'd throw away all those annoying character tables and encode them as part of the bits. It would solve all practical problems with unicode. But we aren't gonna have that, so we should instead stick with no unicode support for the time being. At least until they finally decide that unicode was a huge mistake and restart the whole thing.
On Tue, Jul 10, 2018, 4:15 PM Soni "They/Them" L. <fakedme@gmail.com <mailto:fakedme@gmail.com>> wrote:
 On 2018年07月10日 05:31 PM, Gregg Reynolds wrote:
 >
 >
 > On Tue, Jul 10, 2018, 9:00 AM Dirk Laurie <dirk.laurie@gmail.com
 <mailto:dirk.laurie@gmail.com>
 > <mailto:dirk.laurie@gmail.com <mailto:dirk.laurie@gmail.com>>>
 wrote:
 >
 >     2018年07月10日 15:30 GMT+02:00 Lorenzo Donati
 >     <lorenzodonatibz@tiscali.it
 <mailto:lorenzodonatibz@tiscali.it>
 <mailto:lorenzodonatibz@tiscali.it
 <mailto:lorenzodonatibz@tiscali.it>>>:
 >
 >     > Unicode is great for typesetting (I use regularly LaTeX
 and it's
 >     fun to find
 >     > almost every symbol you may imagine, even ancient German runic
 >     scripts!),
 >     > but it sucks (IMHO) for general programming or
 computer-related
 >     stuff. Too
 >     > much mind overhead to use correctly for little gain.
 >
 >     Yes, yes, but — if you will allow me to return to Lua and
 UTF-8 —
 >     there would
 >     be more gain for a programmer if we had (if it is not too
 late already
 >     for Lua 5.4)
 >     utf8 versions of find, sub, match, gsub, gmatch, reverse. Just
 >     those, not asking
 >     for upper/lower, operating only on simple codepoints, no
 combining
 >     characters,
 >     no need for a C library.
 >
 >
 > Utf8 != Unicode. It's an encoding; you don't get to pick a
 subset and
 > still claim Unicode support.
 >
 > "Simple codepoints"? Does Unicode define that? If not, who decides
 > what that means? Zero-width space is pretty simple.
 >
 > No combining chars? Ok, but that would not be Unicode. Practical
 > result: massive confusion and complaining. You cannot accept
 Unicode
 > and reject combining chars.
 >
 >
 >
 >     utf8.find ("Hélène",'n')  --> 5 5
 >     utf8.sub ("Hélène",5)   --> 'ne'
 >     utf8.gsub ("Hélène","[éè]","e")  --> 'Helene' 2
 >     utf8.reverse ("Hélène")   --> 'enèléH'
 >
 https://gist.github.com/SoniEx2/ecd119507f160d9c26e3eabd9e012dc0

AltStyle によって変換されたページ (->オリジナル) /