Re: UTF-8 testing
[
Date Prev][
Date Next][
Thread Prev][
Thread Next]
[
Date Index]
[
Thread Index]
- Subject: Re: UTF-8 testing
- From: Henning Diedrich <hd2010@...>
- Date: 2011年1月07日 05:15:56 +0100
Luiz,
is this appropriate source for Lua, e.g. the bit operators, or are
they posing a problem somewhere?
unsigned char *c = (unsigned char *)getstr(rawtsvalue(rb));
unsigned char *q = c + tsvalue(rb)->len;
size_t count = 0;
while(c < q) {
if (c[0] <= 127 ) c++;
else if ((c[0] >= 0xC2) && (c[0] <= 0xDF)
&& ((c[1] & 0xC0) == 0x80)) c+=2;
else if (((c[0] & 0xF0) == 0xE0) && ((c[1] &
0xC0) == 0x80) && ((c[2] & 0xC0) == 0x80)) c+=3;
else if (((c[0] & 0xF8) == 0xF0) && ((c[1] &
0xC0) == 0x80) && ((c[2] & 0xC0) == 0x80) &&
((c[3] & 0xC0) == 0x80)) c+=4;
else if ((c[0] == 0xF4) && ((c[1] & 0xF0) ==
0x80) && ((c[2] & 0xC0) == 0x80) && ((c[3] &
0xC0) == 0x80)) c+=4;
else { count--; c++; } /* invalid sequence, don't count */
count++;
}
Cheers,
Henning