On Thu, Feb 09, 2017 at 02:30:39PM +0200, Dirk Laurie wrote: > 2017年02月09日 14:05 GMT+02:00 云风 Cloud Wu <cloudwu@gmail.com>: > > > But there is another problem. > > > > local s = "\xE4\xBA" > > assert(utf8.len(s, 1, 2) == utf8.len(s .. "\x91",1,2)) -- failed > > Why is this a problem? It should fail. s is not a valid UTF8 codepoint > ("\xE4" promises three bytes, but there are only two). When you > supply the extra byte, there is one valid codepoint. starting between > charaters 1 and 2. The manual says: > Returns the number of UTF-8 characters in string s that **start** > between positions i and j (both inclusive). Extra emphasis on **start**. The 3 byte does sequence starts within the range given. -- Zash
Attachment:
signature.asc
Description: PGP signature