Maxime Coste
d9d2140ea2
Fix regex not always selecting the leftmost longest match
...
(Actually the rightmost longest match when searching backwards)
Fixes #2710
2019-02-04 17:33:29 +11:00
Maxime Coste
77b1216ace
Add a peephole optimization pass to the regex compiler
2019-01-20 22:59:28 +11:00
Maxime Coste
0364a99827
Refactor regex find next start not to be an instruction anymore
...
The same logic can be hard coded, avoiding one thread and 3
instructions, improving the regex matching speed.
2019-01-20 22:59:28 +11:00
Maxime Coste
fd043435e5
Split compile time regex flags from runtime ones
2019-01-20 22:59:28 +11:00
Maxime Coste
328c497be2
Add support for named captures to the regex impl and regex highlighter
...
ECMAScript is adding support for it, and it is a pretty isolated
change to do.
Fixes #2293
2019-01-03 22:55:50 +11:00
Maxime Coste
ef3419edbf
Do not pass thread to failed/consumed, capture it implicitely
2018-12-19 19:16:14 +11:00
Maxime Coste
0b9f782691
Take iterators by const-ref in ThreadedRegexVM::exec
2018-12-19 19:14:42 +11:00
Maxime Coste
021ba55b38
Small code tweak in DualThreadStack::swap_next
2018-11-14 17:50:17 +11:00
Maxime Coste
8c2c3d27ad
Fix memory leak in DualThreadStack
...
Fixes #2556
2018-11-07 12:28:41 +11:00
Maxime Coste
7f83c41256
align ThreadedRegexVM::Thread to permit fused copy optimization
...
Aligning makes gcc able to copy a Thread object with a single
32bit mov instruction instead of two 16bits one.
2018-11-06 20:13:09 +11:00
Maxime Coste
05a9eb62f4
Never grow the DualThreadStack in push_next
...
As we do at most one push_next per step_thread, and we pop_current
before step_thread, we can avoid a branch there at the expense of
sometimes growing unecessarily (once).
2018-11-06 07:32:47 +11:00
Maxime Coste
7fbde0d44e
Various micro performance tweaks in ThreadedRegexVM
2018-11-05 21:54:29 +11:00
Maxime Coste
7959c7f731
Refactor ThreadedRegexVM::exec_program to avoid branching
...
Moving logic into step_thread instead of returning an enum to
select what to run avoids the switch logic and improves run time.
2018-11-05 19:46:53 +11:00
Maxime Coste
7463a0d449
Remove use of utf8::iterator in regex execution
...
This avoids having two copies of the subject string bounds, one
in the ExecConfig and one in the utf8 iterator.
2018-11-05 08:17:50 +11:00
Maxime Coste
4ac7df3842
Remove most regex impl special casing for backwards matching
2018-11-03 13:52:40 +11:00
Maxime Coste
ee74c2c2df
Use custom code instead of reverse_iterator in Regex VM
2018-11-02 08:23:39 +11:00
Maxime Coste
6fce8050ee
Use BufferCoord sentinel type for regex matching on BufferIterators
...
BufferIterators are large-ish, and need to check the buffer pointer
on comparison. Checking against a coord is just a 64 bit comparison.
2018-11-01 21:51:10 +11:00
Maxime Coste
4cd7583bbc
Improve regex vm to next start performance by avoiding iterator copies
2018-11-01 08:22:43 +11:00
Maxime Coste
d652ec9ce1
Cleanup regex lookarounds implementation and reject incompatible regex
...
Fixes #2487
2018-10-10 22:47:59 +11:00
Maxime Coste
9024d41d64
Fix integer overflow leading to bad memory access in regex execution
...
Fixes #2481
Fixes #2480
2018-10-08 12:43:12 +11:00
Maxime Coste
7cf3cbde8e
Cleanup some trailing whitespaces and double semicolon
2018-07-26 21:56:34 +10:00
Maxime Coste
0d6e04257b
Fix memory leak in regex execution
2018-07-25 20:57:11 +10:00
Maxime Coste
7ed5d53fe6
Fix RegexCompileFlags::Backwards having the same value as Optimize
...
That means every Optimized regex had the Backwards version
compiled as well, which doubled the time it took to compile them
and doubled the memory usage of regex.
This should improve #2152
2018-07-19 18:34:40 +10:00
Olivier Perret
67655de947
Use a dedicated vm op for dot when match-newline is false
2018-06-24 12:41:50 +02:00
Maxime Coste
787ca7f19b
Regex: small code style tweak
2018-04-29 19:58:18 +10:00
Maxime Coste
1e8026f143
Regex: Use only 128 characters in start desc and encode others as 0
...
Using 257 was using lots of memory for no good reason, as > 127
codepoint are not common enough to be treated specially.
2018-04-29 19:58:18 +10:00
Maxime Coste
528ecb7417
Regex: Use a custom 'DualThreadStack' structure to hold thread info
...
Instead of using two vectors, we can hold both current and next
threads in a single buffer, with stacks growing on each end.
Benchmarking shows this to be slightly faster, and should use less memory.
2018-04-29 19:58:18 +10:00
Maxime Coste
8438b33175
Add a debug regex command to dump regex instructions
2018-04-27 08:35:09 +10:00
Maxime Coste
f10eb9faa3
Use indices instead of pointers for saves/instruction in ThreadedRegexVM
...
Performance seems unaffacted, but memory usage should be lowered
as the Thread struct is 4 bytes instead of 16.
2018-04-27 08:35:09 +10:00
Maxime Coste
fa17c46653
Regex: Refactor ThreadedRegexVM state handling
...
Remove ExecState to store threads inside the ThreadedRegexVM so that
memory buffers can be reused between executions. Extract an ExecConfig
struct with all the data thats execution specific to avoid storing
it needlessly inside the ThreadedRegexVM.
2018-04-25 21:19:04 +10:00
Maxime Coste
fb65fa60f8
Regex: take the full subject range as a parameter
...
To allow more general look arounds out of the actual search range,
pass a second range (the actual subject). This allows us to remove
various flags such as PrevAvailable or NotBeginOfSubject, which are
now easy to check from the subject range.
Fixes #1902
2018-03-05 05:48:10 +11:00
Maxime Coste
d9e44dfacf
Regex: Remove helper functions from regex_impl.hh
...
They were close duplicates from the ones in regex.hh and not used
anywhere else.
2018-03-05 03:10:47 +11:00
Maxime Coste
933ac4d3d5
Regex: Improve comments and constify some variables
...
Reword various comments to make some tricky parts of the regex
engine easier to understand.
2018-02-24 17:40:08 +11:00
Maxime Coste
af21d4ca1e
regex: track CompiledRegex::StartDesc in the Regex memory domain
2018-02-24 16:29:24 +11:00
Maxime Coste
6851604546
Regex: Add a RegexExecFlags::NotEndOfSubject flag
2017-12-29 09:55:38 +11:00
Maxime Coste
413f880e9e
Regex: Support forward and backward matching code in the same CompiledRegex
...
No need to have two separate regexes to handle forward and backward
matching, just passing RegexCompileFlags::Backward will add support
for backward matching to the regex. For backward only regex, pass
RegexCompileFlags::NoForward as well to disable generation of
forward matching code.
2017-12-01 19:57:02 +08:00
Maxime Coste
8d892eeb62
Regex: use StartDesc to early out when not searching
...
Early out as well if we do not find any potential start position.
2017-12-01 15:03:03 +08:00
Maxime Coste
65b057f261
Regex: rename StartChars to StartDesc
...
It only contains chars for now, but its still more generally
describing where matches can start.
2017-12-01 14:46:18 +08:00
Maxime Coste
a52da6fe34
Regex: Tweak is_ctype implementation style
2017-11-28 00:13:42 +08:00
Maxime Coste
8b40f57145
Regex: Replace generic 'Matchers' with specialized functionality
...
Introduce CharacterClass and CharacterType Regex Op, and optimize
their evaluation.
2017-11-25 18:14:15 +08:00
Maxime Coste
5cfccad39c
Regex: Use MemoryDomain::Regex for captures and MatchResults contents
2017-11-12 12:30:21 +08:00
Maxime Coste
c9b43d3634
Regex: directly store instruction pointer in Thread struct
2017-11-11 15:15:13 +08:00
Maxime Coste
c74becc6af
Regex: fix RegexCompileFlags not being an enum class
2017-11-01 14:05:15 +08:00
Maxime Coste
2d901dc76f
Regex: slight readability improvement and workaround a potential gcc bug
2017-11-01 14:05:15 +08:00
Maxime Coste
9e15207d2a
Regex: put the other char boolean inside the general start char map
2017-11-01 14:05:15 +08:00
Maxime Coste
e9e9a08e7b
Regex: refactor handling of Saves slightly, do not create them until really needed
2017-11-01 14:05:15 +08:00
Maxime Coste
d9b4076e3c
Regex: Go back to instruction based search of next start
...
The previous method, which was a bit faster in the general use case,
can hit some cases where we get quadratic behaviour and very slow
matching.
By using an instruction, we can guarantee our complexity of O(N*M)
as we will never have more than N threads (N being the instruction
count) and we run the threads once per codepoint in the subject
string.
That slows down the general case slightly, but ensure we dont have
pathological cases.
This new version is much faster than the previous instruction based
search because it does not use a plain `.*` searcher, but a specific,
smarter instruction specialized for finding the next start if we are
in the correct conditions.
2017-11-01 14:05:15 +08:00
Maxime Coste
c423b47109
Regex: compute if codepoints outside of the start chars map can start
2017-11-01 14:05:15 +08:00
Maxime Coste
87eec79d07
Regex: comment the mutables in CompiledRegex::Instruction and fix their init
2017-11-01 14:05:14 +08:00
Maxime Coste
8b2297f5ca
Regex: Introduce a Regex memory domain to track usage separately
2017-11-01 14:05:14 +08:00