Commit Graph

114 Commits

Author SHA1 Message Date
Maxime Coste
2b74cd4b59 Remove instructions from ExecConfig
We can just compute whenever we reset last_step, which does not happen
often and we know `forward` at compile time anyway
2023-02-19 11:46:17 +11:00
Maxime Coste
f115af7a57 Optimize Regex CharacterClass matching
Take advantage of ranges sorting to early out, make the logic
inline.
2023-02-19 11:16:14 +11:00
Maxime Coste
85ceef29bd Fix broken corner cases in DualThreadStack::grow_ifn
We only grow when the ring buffer is full, which allows for a nice
simplification of the code.

Tell grow_ifn if we pushed in current or next so that we can
distinguish between filled by next or filled by current when
m_current == m_next_begin
2023-02-14 17:13:31 +11:00
Maxime Coste
d708b77186 Refactor DualThreadStack as a RingBuffer
Instead of two stacks growing from the two ends of a buffer, use
a ring buffer growing from the same mid spot.

This avoids the costly memory copy every step when we set next
threads as the current ones.
2023-02-14 07:04:54 +11:00
Maxime Coste
762064dc68 Remove scheduled optimization from ThreadedRegexVM
This does not seem to actually speed up execution as threads will
be dropped on next step anyway
2023-02-13 21:15:55 +11:00
Maxime Coste
f5d5274c5f Fix incorrect use of subject end/begin in regex execution
This could lead to reading past subject string end in certain
conditions

Fixes #4794
2023-01-23 17:38:02 +11:00
Maxime Coste
0c1d4808fa Slight code style tweak 2022-08-20 11:03:03 +02:00
Maxime Coste
21047db4a0 Remove unnecessary utf8 decoding when looking for EOL in regex 2022-08-20 11:03:03 +02:00
Maxime Coste
c8c8051bd0 Refactor RegionsHighlighter to share regexes
Instead of storing regexes in each regions, move them to the core
highlighter in a hash map so that shared regexes between different
regions are only applied once per update instead of once per region

Also change iteration logic to apply all regex together to each
changed lines to improve memory locality on big buffers.

For the big_markdown.md file described in #4685 this reduces
initial display time from 3.55s to 2.41s on my machine.
2022-08-20 11:02:59 +02:00
Maxime Coste
ca71d8997d Reuse existing character classes when possible in regex 2022-08-05 20:31:39 +10:00
Maxime Coste
33e81af0f3 Fix regex alternation execution priority
The ThreadedRegexVM implementation does not execute split opcodes as
expected: on split the pending thread is pushed on top of the thread
stack, which means that when multiple splits are executed in a row
(such as with a disjunction with 3 or more branches) the last split
target gets on top of the thread stack and gets executed next (when the
thread from the first split target would be the expected one)

Fixing this in the ThreadedRegexVM would have a performance impact as
we would not be able to use a plain stack for current threads, so the
best solution at the moment is to reverse the order of splits generated
by a disjunction.

Fixes #4519
2022-02-02 14:51:17 +11:00
Maxime Coste
ba379cba52 Micro-optimize regex character class/type matching
Also force-inline step_thread as function call overhead has a
mesurable impact.
2021-11-21 09:44:22 +11:00
Maxime Coste
8566ae14a0 Reduce the amount of Regex VM Instruction code
Merge all lookarounds into the same instruction, merge splits, merge
literal ignore case with literal...

Besides reducing the amount of almost duplicated code, this improves
performance by reducing pressure on the (often failing) branch target
prediction for instruction dispatching by moving branches into the
instruction code themselves where they are more likely to be well
predicted.
2021-11-21 09:44:18 +11:00
Maxime Coste
da80a8cf6a Raise ThreadedVM initial thread capacity to 16
Threads are 4 bytes, an initial capacity of 4 led to allocating 16
bytes, raising that to 64 bytes seems quite reasonable.
2021-03-03 20:51:24 +11:00
Maxime Coste
d539e8fb89 Do not decode utf-8 when looking for regex next start
There is no need to decode as we know any non-ascii characters will
be treated as Other in the StartDesc.
2019-12-04 22:33:11 +11:00
Jason Felice
d26bb0ce2b Add static or const where useful 2019-11-09 12:53:45 -05:00
Maxime Coste
d9d2140ea2 Fix regex not always selecting the leftmost longest match
(Actually the rightmost longest match when searching backwards)

Fixes #2710
2019-02-04 17:33:29 +11:00
Maxime Coste
77b1216ace Add a peephole optimization pass to the regex compiler 2019-01-20 22:59:28 +11:00
Maxime Coste
0364a99827 Refactor regex find next start not to be an instruction anymore
The same logic can be hard coded, avoiding one thread and 3
instructions, improving the regex matching speed.
2019-01-20 22:59:28 +11:00
Maxime Coste
fd043435e5 Split compile time regex flags from runtime ones 2019-01-20 22:59:28 +11:00
Maxime Coste
328c497be2 Add support for named captures to the regex impl and regex highlighter
ECMAScript is adding support for it, and it is a pretty isolated
change to do.

Fixes #2293
2019-01-03 22:55:50 +11:00
Maxime Coste
ef3419edbf Do not pass thread to failed/consumed, capture it implicitely 2018-12-19 19:16:14 +11:00
Maxime Coste
0b9f782691 Take iterators by const-ref in ThreadedRegexVM::exec 2018-12-19 19:14:42 +11:00
Maxime Coste
021ba55b38 Small code tweak in DualThreadStack::swap_next 2018-11-14 17:50:17 +11:00
Maxime Coste
8c2c3d27ad Fix memory leak in DualThreadStack
Fixes #2556
2018-11-07 12:28:41 +11:00
Maxime Coste
7f83c41256 align ThreadedRegexVM::Thread to permit fused copy optimization
Aligning makes gcc able to copy a Thread object with a single
32bit mov instruction instead of two 16bits one.
2018-11-06 20:13:09 +11:00
Maxime Coste
05a9eb62f4 Never grow the DualThreadStack in push_next
As we do at most one push_next per step_thread, and we pop_current
before step_thread, we can avoid a branch there at the expense of
sometimes growing unecessarily (once).
2018-11-06 07:32:47 +11:00
Maxime Coste
7fbde0d44e Various micro performance tweaks in ThreadedRegexVM 2018-11-05 21:54:29 +11:00
Maxime Coste
7959c7f731 Refactor ThreadedRegexVM::exec_program to avoid branching
Moving logic into step_thread instead of returning an enum to
select what to run avoids the switch logic and improves run time.
2018-11-05 19:46:53 +11:00
Maxime Coste
7463a0d449 Remove use of utf8::iterator in regex execution
This avoids having two copies of the subject string bounds, one
in the ExecConfig and one in the utf8 iterator.
2018-11-05 08:17:50 +11:00
Maxime Coste
4ac7df3842 Remove most regex impl special casing for backwards matching 2018-11-03 13:52:40 +11:00
Maxime Coste
ee74c2c2df Use custom code instead of reverse_iterator in Regex VM 2018-11-02 08:23:39 +11:00
Maxime Coste
6fce8050ee Use BufferCoord sentinel type for regex matching on BufferIterators
BufferIterators are large-ish, and need to check the buffer pointer
on comparison. Checking against a coord is just a 64 bit comparison.
2018-11-01 21:51:10 +11:00
Maxime Coste
4cd7583bbc Improve regex vm to next start performance by avoiding iterator copies 2018-11-01 08:22:43 +11:00
Maxime Coste
d652ec9ce1 Cleanup regex lookarounds implementation and reject incompatible regex
Fixes #2487
2018-10-10 22:47:59 +11:00
Maxime Coste
9024d41d64 Fix integer overflow leading to bad memory access in regex execution
Fixes #2481
Fixes #2480
2018-10-08 12:43:12 +11:00
Maxime Coste
7cf3cbde8e Cleanup some trailing whitespaces and double semicolon 2018-07-26 21:56:34 +10:00
Maxime Coste
0d6e04257b Fix memory leak in regex execution 2018-07-25 20:57:11 +10:00
Maxime Coste
7ed5d53fe6 Fix RegexCompileFlags::Backwards having the same value as Optimize
That means every Optimized regex had the Backwards version
compiled as well, which doubled the time it took to compile them
and doubled the memory usage of regex.

This should improve #2152
2018-07-19 18:34:40 +10:00
Olivier Perret
67655de947 Use a dedicated vm op for dot when match-newline is false 2018-06-24 12:41:50 +02:00
Maxime Coste
787ca7f19b Regex: small code style tweak 2018-04-29 19:58:18 +10:00
Maxime Coste
1e8026f143 Regex: Use only 128 characters in start desc and encode others as 0
Using 257 was using lots of memory for no good reason, as > 127
codepoint are not common enough to be treated specially.
2018-04-29 19:58:18 +10:00
Maxime Coste
528ecb7417 Regex: Use a custom 'DualThreadStack' structure to hold thread info
Instead of using two vectors, we can hold both current and next
threads in a single buffer, with stacks growing on each end.

Benchmarking shows this to be slightly faster, and should use less memory.
2018-04-29 19:58:18 +10:00
Maxime Coste
8438b33175 Add a debug regex command to dump regex instructions 2018-04-27 08:35:09 +10:00
Maxime Coste
f10eb9faa3 Use indices instead of pointers for saves/instruction in ThreadedRegexVM
Performance seems unaffacted, but memory usage should be lowered
as the Thread struct is 4 bytes instead of 16.
2018-04-27 08:35:09 +10:00
Maxime Coste
fa17c46653 Regex: Refactor ThreadedRegexVM state handling
Remove ExecState to store threads inside the ThreadedRegexVM so that
memory buffers can be reused between executions. Extract an ExecConfig
struct with all the data thats execution specific to avoid storing
it needlessly inside the ThreadedRegexVM.
2018-04-25 21:19:04 +10:00
Maxime Coste
fb65fa60f8 Regex: take the full subject range as a parameter
To allow more general look arounds out of the actual search range,
pass a second range (the actual subject). This allows us to remove
various flags such as PrevAvailable or NotBeginOfSubject, which are
now easy to check from the subject range.

Fixes #1902
2018-03-05 05:48:10 +11:00
Maxime Coste
d9e44dfacf Regex: Remove helper functions from regex_impl.hh
They were close duplicates from the ones in regex.hh and not used
anywhere else.
2018-03-05 03:10:47 +11:00
Maxime Coste
933ac4d3d5 Regex: Improve comments and constify some variables
Reword various comments to make some tricky parts of the regex
engine easier to understand.
2018-02-24 17:40:08 +11:00
Maxime Coste
af21d4ca1e regex: track CompiledRegex::StartDesc in the Regex memory domain 2018-02-24 16:29:24 +11:00