Jerome Vuarand wrote: [...] > Also keep in mind that many Unicode characters are meant to be combined > with others (`+E gives È for example), and as such you will have multiple > unicode codepoints for a single grapheme (and a single character cell). > Character offset in unicode strings don't reflect grapheme offset in the > string graphical representation, even with fixed width fonts. That's why I said 'grapheme clusters'... In fact, when dealing with UTF-8 strings, all text should be normalised so you *don't* get the issue you mention above. Multiple-character graphemes should be collapsed down into a single character whereever possible (I believe that it is possible for all romance languages, but I could be wrong). However, I'm slowly coming to the conclusion that I'm going to have to write some custom code for dealing with all this simply due to that fact that what I'm really interested in is physical character width, which means I'm going to have to call wcwidth() a lot. Sigh. <musing type="out loud"> So, I need: - a function to wrap a paragraph of text. - a function to draw a line of text, positioning the cursor in the right place. - a function to step forwards or backwards through a string a certain number of grapheme clusters. I think that's all I need. I should be able to do the rest with just those three, and conventional string munging tools. Hmm... </musing> -- ╭─┈David Given┈──McQ─╮ "There are two major products that come out of │┈┈dg@cowlark.com┈┈┈┈│ Berkeley: LSD and Unix. We don't believe this to be │┈(dg@tao-group.com)┈│ a coincidence." --- Jeremy S. Anderson ╰─┈www.cowlark.com┈──╯
Attachment:
signature.asc
Description: OpenPGP digital signature