Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit 0109aa6

Browse files
Simplify decoding filter for UTF-8
When decoding a 3-byte UTF-8 code unit, redundant checks for overlong code unit and for illegal codepoints from U+D800-DFFF were included. Both of these conditions are caught by the line which reads: if ((c2 & 0xC0) != 0x80 || (c == 0xF0 && c2 < 0x90) || (c == 0xF4 && c2 >= 0x90)) { As such, there is no reason to check for the same error conditions again. Likewise, when decoding a 4-byte UTF-8 code unit, there was a redundant check for overlong code unit. That was already caught by the line which reads: if ((c2 & 0xC0) != 0x80 || (c == 0xF0 && c2 < 0x90) || (c == 0xF4 && c2 >= 0x90)) {
1 parent 50e3201 commit 0109aa6

File tree

1 file changed

+5
-6
lines changed

1 file changed

+5
-6
lines changed

‎ext/mbstring/libmbfl/filters/mbfilter_utf8.c‎

Lines changed: 5 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -249,11 +249,9 @@ static size_t mb_utf8_to_wchar(unsigned char **in, size_t *in_len, uint32_t *buf
249249
p--;
250250
} else {
251251
uint32_t decoded = ((c & 0xF) << 12) | ((c2 & 0x3F) << 6) | (c3 & 0x3F);
252-
if (decoded < 0x800 || (decoded >= 0xD800 && decoded <= 0xDFFF)) {
253-
*out++ = MBFL_BAD_INPUT;
254-
} else {
255-
*out++ = decoded;
256-
}
252+
ZEND_ASSERT(decoded >= 0x800); /* Not an overlong code unit */
253+
ZEND_ASSERT(decoded < 0xD800 || decoded > 0xDFFF); /* U+D800-DFFF are reserved, illegal code points */
254+
*out++ = decoded;
257255
}
258256
} else {
259257
*out++ = MBFL_BAD_INPUT;
@@ -283,7 +281,8 @@ static size_t mb_utf8_to_wchar(unsigned char **in, size_t *in_len, uint32_t *buf
283281
p--;
284282
} else {
285283
uint32_t decoded = ((c & 0x7) << 18) | ((c2 & 0x3F) << 12) | ((c3 & 0x3F) << 6) | (c4 & 0x3F);
286-
*out++ = (decoded < 0x10000) ? MBFL_BAD_INPUT : decoded;
284+
ZEND_ASSERT(decoded >= 0x10000); /* Not an overlong code unit */
285+
*out++ = decoded;
287286
}
288287
} else {
289288
*out++ = MBFL_BAD_INPUT;

0 commit comments

Comments
(0)

AltStyle によって変換されたページ (->オリジナル) /