Return to Answer

deleted 1 character in body

edited Nov 7, 2019 at 20:35

12.2k
1
19
37

As noted, the zero detection trick would also detect any 0x80 bytes (€ in CP1252, various different characters in UTF-8 contain 0x80 as continuation byte, for example Hiragana mu: む = "\xE3\x82\x80") as if they were zero-terminators. There are slightly more expensive "contains zero byte" checks that avoid this, for example (sprinkle with parenteses as desired):

i = i - LOW_MASK & ~i & NOT_HIGH_MASK;

That replaces 3 operations from the original, so it's not that costly, and additionally it could be used as fallback test after the simpler test thinks it has found a zero (though that is not favourable for strings with many 0x80 in them). It's not a straight upgrade, so it's for you to weigh the trade-off.

This uses the definition #define NOT_HIGH_MASK 0x0808080800x80808080 as used in this question, not HIGH_MASK = 0x80808080 as may be expected.

This trick operates on the same basic principle as the trick in the question: subtracting 1 from 0 sets the high bit of that byte, because it can borrow all the way through, but any set bit would stop the borrow from reaching the top. However, it fixes the problem of "what if the top bit was already set" by ANDing with ~i afterwards, rather than by ANDing with HIGH_MASK = 0x7f7f7f7f beforehand (which also turns 0x80 into zero).

i = i - LOW_MASK & ~i & NOT_HIGH_MASK;

This uses the definition #define NOT_HIGH_MASK 0x080808080 as used in this question, not HIGH_MASK = 0x80808080 as may be expected.

i = i - LOW_MASK & ~i & NOT_HIGH_MASK;

This uses the definition #define NOT_HIGH_MASK 0x80808080 as used in this question, not HIGH_MASK = 0x80808080 as may be expected.

added 335 characters in body

Source Link

edited Nov 7, 2019 at 20:15

user555045

edited Nov 7, 2019 at 20:15

user555045

12.2k
1
19
37

i = i - LOW_MASK & ~i & NOT_HIGH_MASK;

This uses the definition #define NOT_HIGH_MASK 0x080808080 as used in this question, not HIGH_MASK = 0x80808080 as may be expected.

i = i - LOW_MASK & ~i & NOT_HIGH_MASK;

This uses the definition #define NOT_HIGH_MASK 0x080808080 as used in this question, not HIGH_MASK = 0x80808080 as may be expected.

i = i - LOW_MASK & ~i & NOT_HIGH_MASK;

This uses the definition #define NOT_HIGH_MASK 0x080808080 as used in this question, not HIGH_MASK = 0x80808080 as may be expected.

added 140 characters in body

Source Link

edited Nov 7, 2019 at 18:43

user555045

edited Nov 7, 2019 at 18:43

user555045

12.2k
1
19
37

i = i - LOW_MASK & ~i & NOT_HIGH_MASK;

This uses the definition #define NOT_HIGH_MASK 0x080808080 as used in this question, not HIGH_MASK = 0x80808080 as may be expected.

i = i - LOW_MASK & ~i & NOT_HIGH_MASK;

i = i - LOW_MASK & ~i & NOT_HIGH_MASK;

This uses the definition #define NOT_HIGH_MASK 0x080808080 as used in this question, not HIGH_MASK = 0x80808080 as may be expected.

Source Link

answered Nov 7, 2019 at 17:30

user555045

answered Nov 7, 2019 at 17:30

user555045

12.2k
1
19
37

lang-c