Re: Changes in the validation of UTF-8
[
Date Prev][
Date Next][
Thread Prev][
Thread Next]
[
Date Index]
[
Thread Index]
- Subject: Re: Changes in the validation of UTF-8
- From: Andrew Gierth <andrew@...>
- Date: 2019年3月17日 21:01:40 +0000
>>>>> "Dirk" == Dirk Laurie <dirk.laurie@gmail.com> writes:
Dirk> Lua in no way even comes close to validating against the current
Dirk> UTF-8 standard. We've been through this before. Marc Balmer in
Dirk> particular has been quite trenchant on this point.
Other than the fact that it fails to reject encoded surrogates, what
invalid sequence does the code in lua 5.3.5 accept?
Dirk> All that Lua does is to verify that a string satisfies the basic
Dirk> UTF-8 encoding: ASCII or a starting byte whose introductory
Dirk> string of 1's says how many bytes in total are being encoded,
Dirk> followed by the right number of 10... bytes.
That's ... not what the 5.3.5 utf8_decode does. Did you read it? Test
it?
--
Andrew.