Do not decode utf8 while looking for next regex match start candidate

If the first byte in the multi-byte utf8 sequence does not match,
it means the "other" character is not set, so none of the sequence
byte will match (as they are all with the MSB set). This tightens
the critical loop which ends up running faster in most cases.
This commit is contained in:
Maxime Coste 2024-02-11 12:17:21 +11:00
parent 8d60e19484
commit bd91a255e4

View File

@ -530,7 +530,7 @@ private:
const unsigned char c = *start; const unsigned char c = *start;
if (start_desc.map[(c < StartDesc::count) ? c : StartDesc::other]) if (start_desc.map[(c < StartDesc::count) ? c : StartDesc::other])
return; return;
utf8::to_next(start, config.end); ++start;
} }
else else
{ {