Do not decode utf8 while looking for next regex match start candidate
If the first byte in the multi-byte utf8 sequence does not match, it means the "other" character is not set, so none of the sequence byte will match (as they are all with the MSB set). This tightens the critical loop which ends up running faster in most cases.
This commit is contained in:
parent
8d60e19484
commit
bd91a255e4
|
@ -530,7 +530,7 @@ private:
|
||||||
const unsigned char c = *start;
|
const unsigned char c = *start;
|
||||||
if (start_desc.map[(c < StartDesc::count) ? c : StartDesc::other])
|
if (start_desc.map[(c < StartDesc::count) ? c : StartDesc::other])
|
||||||
return;
|
return;
|
||||||
utf8::to_next(start, config.end);
|
++start;
|
||||||
}
|
}
|
||||||
else
|
else
|
||||||
{
|
{
|
||||||
|
|
Loading…
Reference in New Issue
Block a user