Maxime Coste
9ec175f2f8
Regex: use binary search to for character class ranges check
2017-11-01 14:05:14 +08:00
Maxime Coste
6e65589a34
Regex: compute start chars from matchers, do not compute it from lookarounds
...
Computing potential start characters from lookarounds is more complex
than expected, and not worth the complexity.
2017-11-01 14:05:14 +08:00
Maxime Coste
df16fea82d
Regex: rename "flags" with the more common "modifiers"
2017-11-01 14:05:14 +08:00
Maxime Coste
52d443f764
Regex: Correctly handle ignore case mode for start chars computation
2017-11-01 14:05:14 +08:00
Maxime Coste
b8495f0953
Regex: Rework parsing, treat lookarounds as assertions, and flags separately
2017-11-01 14:05:14 +08:00
Maxime Coste
b0233262b8
Regex: Limit programs to std::numeric_limits<uint16_t>::max() instructions
2017-11-01 14:05:14 +08:00
Maxime Coste
8c8dcb3a84
Regex: Fix reverse searching behaviour, again
2017-11-01 14:05:14 +08:00
Maxime Coste
9753bcd0ad
Regex: limit explicit quantifiers value (too 1000 for now)
...
Fixes #1628
2017-11-01 14:05:14 +08:00
Maxime Coste
2b97e4e124
Regex: Fix handling of ^ and $ in backward matching mode
2017-11-01 14:05:14 +08:00
Maxime Coste
5bf4be645a
Regex: Fix support for ignore case in lookarounds
2017-11-01 14:05:14 +08:00
Maxime Coste
23b3a221eb
Regex: support more than two children in alternations
...
Avoid deep nested alternations, parse them flattened.
2017-11-01 14:05:14 +08:00
Maxime Coste
fb5243f710
Regex: print instruction index in dump_regex
2017-11-01 14:05:14 +08:00
Maxime Coste
74ed102cab
Regex: Tweak definition of character class and control escape tables
2017-11-01 14:05:14 +08:00
Maxime Coste
df73b71dfc
Regex: fix lookarounds handling when computing starting chars
2017-11-01 14:05:14 +08:00
Maxime Coste
785cd34b4b
Regex: Make boost checking disableable at compile time
2017-11-01 14:05:14 +08:00
Maxime Coste
065bbc8f59
Regex: switch to custom impl, use boost for checking
2017-11-01 14:05:14 +08:00
Maxime Coste
9305fa1369
Regex: Fix lookaround use in moon.kak
...
(?=[A-Z]\w*) is strictly the same as (?=[A-Z]) as \w* will always
at least match an empty string.
2017-11-01 14:05:14 +08:00
Maxime Coste
cca730193c
Regex: Support any char and character classes in lookarounds
...
Lookarounds still need to be fixed size, but accept character classes
as well as plain literals.
2017-11-01 14:05:14 +08:00
Maxime Coste
db06acdfab
Regex: Fix computation of potential starts for lookaheads
2017-11-01 14:05:14 +08:00
Maxime Coste
34b1f1ccb6
Regex: detect when all characters can start and avoid allocating
2017-11-01 14:05:14 +08:00
Maxime Coste
bf3b50a543
Regex: Fix wrong size of character_class_escapes array
2017-11-01 14:05:14 +08:00
Maxime Coste
9ec376135b
Regex: Introduce RegexExecFlags::PrevAvailable
...
Rework assertion code as well.
2017-11-01 14:05:14 +08:00
Maxime Coste
30dacdade2
Regex: deallocate Saves memory on ThreadedRegexVM destruction
2017-11-01 14:05:14 +08:00
Maxime Coste
578640c8a4
Regex: Fix handling of control escapes inside character classes
2017-11-01 14:05:14 +08:00
Maxime Coste
f3736a4b48
Regex: tag instructions as scheduled as well instead of searching
...
And a few more code cleanup in the ThreadedRegexVM
2017-11-01 14:05:14 +08:00
Maxime Coste
4ff655cc09
Regex: store the processed flag directly in CompiledRegex instructions
2017-11-01 14:05:14 +08:00
Maxime Coste
732b8bc2a4
Regex: abandon bytecode and just use a simple list of instructions
...
Makes the code simpler.
2017-11-01 14:05:14 +08:00
Maxime Coste
11abd544c6
Regex: avoid infinite loops
2017-11-01 14:05:14 +08:00
Maxime Coste
c47cdc06a7
Regex: Add support for backward matching
...
Regex can be compiled for backward matching instead of forward matching
and the ThreadedRegexVM is able to iterate in reverse on the subject
string to find the last match instead of the first.
2017-11-01 14:05:14 +08:00
Maxime Coste
071b897e00
Regex: Remove static RegexCompiler::compile
2017-11-01 14:05:14 +08:00
Maxime Coste
52ee62172a
Regex: remove use of buffer_utils.hh from regex_impl.cc
2017-11-01 14:05:14 +08:00
Maxime Coste
c375268c2d
Regex: Use memcpy to write/read offsets from bytecode
...
reinterpret_cast was undefined behaviour as we do not guarantee
that offsets are going to be stored properly aligned.
2017-11-01 14:05:14 +08:00
Maxime Coste
b53227d62c
Regex: slight cleanup of the unit tests
2017-11-01 14:05:14 +08:00
Maxime Coste
337e58d4f9
Regex: Cleanup character class parsing a bit
2017-11-01 14:05:14 +08:00
Maxime Coste
236751cb84
Regex: Make ThreadedRegexVM a proper class, define a proper interface
2017-11-01 14:05:14 +08:00
Maxime Coste
3b69dda04e
Regex: Find potential start position using a map of valid start chars
...
With this optimization we get close to performance parity with boost
regex on the common use cases in Kakoune.
2017-11-01 14:05:14 +08:00
Maxime Coste
741772aef9
Regex: Optimize single char character classes as literals
2017-11-01 14:05:14 +08:00
Maxime Coste
fabeab1ee1
Regex: reorder lookaround ops, group by direction
2017-11-01 14:05:14 +08:00
Maxime Coste
f1b4931824
Regex: Fix handling of non capturing groups (?:...)
...
We were wrongly keeping the `:` as a literal content of the group
2017-11-01 14:05:14 +08:00
Maxime Coste
dbb175841b
Regex: do not write the search prefix inside the program bytecode
...
Its faster to have specialized code in the VM directly
2017-11-01 14:05:14 +08:00
Maxime Coste
e0fac20f6c
Regex: Use a custom allocated buffer for Saves instead of a Vector
2017-11-01 14:05:14 +08:00
Maxime Coste
51ad8b4c85
Regex: fix handling of negative escaped character classes
2017-11-01 14:05:14 +08:00
Maxime Coste
f007794d9c
Regex: introduce RegexExecFlags to control various behaviours
2017-11-01 14:05:14 +08:00
Maxime Coste
630d078b6d
Regex: Fix use of not-yet-constructed CompiledRegex in TestVM impl
2017-11-01 14:05:14 +08:00
Maxime Coste
b4f923b7fc
Regex: min/max quantifiers can be non greedy as well
2017-11-01 14:05:14 +08:00
Maxime Coste
f02b2645da
Regex: validate that our custom impl gets the same results as boost regex
...
In addition to running boost regex, run our custom regex and compare
the results to ensure the two regex engine agree.
2017-11-01 14:05:14 +08:00
Maxime Coste
76dcfd5c52
Regex: support escaping characters in character classes
2017-11-01 14:05:14 +08:00
Maxime Coste
3d2262bebf
Regex: add support for case insensitive matching, controlled by (?i)
2017-11-01 14:05:14 +08:00
Maxime Coste
7673781751
Regex: use \A \z for subject start/end
...
This is the most common syntax in various regex variants.
2017-11-01 14:05:14 +08:00
Maxime Coste
0bdfdac5c5
Regex: Implement lookarounds for fixed literal strings
...
We do not support anything else than a plain literal string for
lookarounds.
2017-11-01 14:05:14 +08:00