Roberto Ierusalimschy wrote: >> Actually dealing with shift-state dependent multi-byte encodings in a >> portable way in C makes the infinite horrors of Unicode and UTF-8 >> seem very attractive. > > This seems a quite acurate summary of the situation. The horrors of UTF-8 are ℵ₀, but the horrors of full Unicode are at *least* ℵ₁... Slightly more seriously, it occurs to me that since composite characters mean you can't rely on any individual glyph being encoded in a single Unicode code-point, then 32-bit Unicode does, in fact, gain you nothing except a false sense of security. You always need to write code to cope with multicharacter glyphs. Unicode is like general relativity. No matter how well you think you understand it, it's always more complicated than you think... -- ╭─┈David Given┈──McQ─╮ "There are two major products that come out of │┈┈dg@cowlark.com┈┈┈┈│ Berkeley: LSD and Unix. We don't believe this to be │┈(dg@tao-group.com)┈│ a coincidence." --- Jeremy S. Anderson ╰─┈www.cowlark.com┈──╯
Attachment:
signature.asc
Description: OpenPGP digital signature