We only grow when the ring buffer is full, which allows for a nice
simplification of the code.
Tell grow_ifn if we pushed in current or next so that we can
distinguish between filled by next or filled by current when
m_current == m_next_begin
Instead of two stacks growing from the two ends of a buffer, use
a ring buffer growing from the same mid spot.
This avoids the costly memory copy every step when we set next
threads as the current ones.
Instead of storing regexes in each regions, move them to the core
highlighter in a hash map so that shared regexes between different
regions are only applied once per update instead of once per region
Also change iteration logic to apply all regex together to each
changed lines to improve memory locality on big buffers.
For the big_markdown.md file described in #4685 this reduces
initial display time from 3.55s to 2.41s on my machine.
The ThreadedRegexVM implementation does not execute split opcodes as
expected: on split the pending thread is pushed on top of the thread
stack, which means that when multiple splits are executed in a row
(such as with a disjunction with 3 or more branches) the last split
target gets on top of the thread stack and gets executed next (when the
thread from the first split target would be the expected one)
Fixing this in the ThreadedRegexVM would have a performance impact as
we would not be able to use a plain stack for current threads, so the
best solution at the moment is to reverse the order of splits generated
by a disjunction.
Fixes#4519
Merge all lookarounds into the same instruction, merge splits, merge
literal ignore case with literal...
Besides reducing the amount of almost duplicated code, this improves
performance by reducing pressure on the (often failing) branch target
prediction for instruction dispatching by moving branches into the
instruction code themselves where they are more likely to be well
predicted.
As we do at most one push_next per step_thread, and we pop_current
before step_thread, we can avoid a branch there at the expense of
sometimes growing unecessarily (once).
That means every Optimized regex had the Backwards version
compiled as well, which doubled the time it took to compile them
and doubled the memory usage of regex.
This should improve #2152
Instead of using two vectors, we can hold both current and next
threads in a single buffer, with stacks growing on each end.
Benchmarking shows this to be slightly faster, and should use less memory.
Remove ExecState to store threads inside the ThreadedRegexVM so that
memory buffers can be reused between executions. Extract an ExecConfig
struct with all the data thats execution specific to avoid storing
it needlessly inside the ThreadedRegexVM.
To allow more general look arounds out of the actual search range,
pass a second range (the actual subject). This allows us to remove
various flags such as PrevAvailable or NotBeginOfSubject, which are
now easy to check from the subject range.
Fixes#1902