lua-users home
lua-l archive

Re: UTF-8 testing

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


Luiz,

is this appropriate source for Lua, e.g. the bit operators, or are they posing a problem somewhere?

unsigned char *c = (unsigned char *)getstr(rawtsvalue(rb));
unsigned char *q = c + tsvalue(rb)->len;
size_t count = 0;

while(c < q) {
if (c[0] <= 127 ) c++;
else if ((c[0] >= 0xC2) && (c[0] <= 0xDF) && ((c[1] & 0xC0) == 0x80)) c+=2;
else if (((c[0] & 0xF0) == 0xE0) && ((c[1] & 0xC0) == 0x80) && ((c[2] & 0xC0) == 0x80)) c+=3;
else if (((c[0] & 0xF8) == 0xF0) && ((c[1] & 0xC0) == 0x80) && ((c[2] & 0xC0) == 0x80) && ((c[3] & 0xC0) == 0x80)) c+=4;
else if ((c[0] == 0xF4) && ((c[1] & 0xF0) == 0x80) && ((c[2] & 0xC0) == 0x80) && ((c[3] & 0xC0) == 0x80)) c+=4;
else { count--; c++; } /* invalid sequence, don't count */
count++;
}

Cheers,
Henning

AltStyle によって変換されたページ (->オリジナル) /