Maxime Coste
236751cb84
Regex: Make ThreadedRegexVM a proper class, define a proper interface
2017-11-01 14:05:14 +08:00
Maxime Coste
3b69dda04e
Regex: Find potential start position using a map of valid start chars
...
With this optimization we get close to performance parity with boost
regex on the common use cases in Kakoune.
2017-11-01 14:05:14 +08:00
Maxime Coste
741772aef9
Regex: Optimize single char character classes as literals
2017-11-01 14:05:14 +08:00
Maxime Coste
fabeab1ee1
Regex: reorder lookaround ops, group by direction
2017-11-01 14:05:14 +08:00
Maxime Coste
854144c535
Regex: Fix handling of Save instruction in ThreadedRegexVM
...
When not saving, we were not fully reading the instruction stream,
leading to an out of sync instruction pointer.
2017-11-01 14:05:14 +08:00
Maxime Coste
f1b4931824
Regex: Fix handling of non capturing groups (?:...)
...
We were wrongly keeping the `:` as a literal content of the group
2017-11-01 14:05:14 +08:00
Maxime Coste
5f6e71c4dc
Regex: More code tweaks and cleanups in ThreadedRegexVM
2017-11-01 14:05:14 +08:00
Maxime Coste
5f54e0de0e
Regex: Code cleanup and refactor for Saves handling
2017-11-01 14:05:14 +08:00
Maxime Coste
dbb175841b
Regex: do not write the search prefix inside the program bytecode
...
Its faster to have specialized code in the VM directly
2017-11-01 14:05:14 +08:00
Maxime Coste
cf5055f68b
Regex: small code tweak
2017-11-01 14:05:14 +08:00
Maxime Coste
e0fac20f6c
Regex: Use a custom allocated buffer for Saves instead of a Vector
2017-11-01 14:05:14 +08:00
Maxime Coste
1399563e40
Regex: make m_current_threads and m_next_threads local variable of exec
2017-11-01 14:05:14 +08:00
Maxime Coste
54da8098ae
Regex: Add a NoSaves RegexExecFlags to disable saving positions
2017-11-01 14:05:14 +08:00
Maxime Coste
119bc38254
Regex: small refactor of ThreadedRegexVM::clone_saves
2017-11-01 14:05:14 +08:00
Maxime Coste
9fbafba4cb
Regex: Refactor thread handling in ThreadedRegexVM
2017-11-01 14:05:14 +08:00
Maxime Coste
589cde67f0
Regex: store saves in a copy on write structure
2017-11-01 14:05:14 +08:00
Maxime Coste
11b9c996ea
Regex: small code style tweak
2017-11-01 14:05:14 +08:00
Maxime Coste
51ad8b4c85
Regex: fix handling of negative escaped character classes
2017-11-01 14:05:14 +08:00
Maxime Coste
adcd02b7d2
Regex: Replace boost regex_iterator impl with our own
...
Ensure we check the results from our own regex impl in all uses of
regexs in Kakoune.
2017-11-01 14:05:14 +08:00
Maxime Coste
f007794d9c
Regex: introduce RegexExecFlags to control various behaviours
2017-11-01 14:05:14 +08:00
Maxime Coste
73b14b11be
Regex: small code tweak in ThreadedRegexVM
2017-11-01 14:05:14 +08:00
Maxime Coste
630d078b6d
Regex: Fix use of not-yet-constructed CompiledRegex in TestVM impl
2017-11-01 14:05:14 +08:00
Maxime Coste
5b0c2cbdc2
Regex: Ensure we dont have a thread explosion in ThreadedRegexVM
...
Always remove threads with lower priority that end up on the same
instruction as a higher priority thread (as we know they will behave
the same from now on)
2017-11-01 14:05:14 +08:00
Maxime Coste
b4f923b7fc
Regex: min/max quantifiers can be non greedy as well
2017-11-01 14:05:14 +08:00
Maxime Coste
f02b2645da
Regex: validate that our custom impl gets the same results as boost regex
...
In addition to running boost regex, run our custom regex and compare
the results to ensure the two regex engine agree.
2017-11-01 14:05:14 +08:00
Maxime Coste
76dcfd5c52
Regex: support escaping characters in character classes
2017-11-01 14:05:14 +08:00
Maxime Coste
3d2262bebf
Regex: add support for case insensitive matching, controlled by (?i)
2017-11-01 14:05:14 +08:00
Maxime Coste
7673781751
Regex: use \A \z for subject start/end
...
This is the most common syntax in various regex variants.
2017-11-01 14:05:14 +08:00
Maxime Coste
0bdfdac5c5
Regex: Implement lookarounds for fixed literal strings
...
We do not support anything else than a plain literal string for
lookarounds.
2017-11-01 14:05:14 +08:00
Maxime Coste
e96cd29f0e
Regex: Support non greedy quantifiers
2017-11-01 14:05:14 +08:00
Maxime Coste
e4004a7b7f
Regex: Add support for \h and \H "horizontal blank" character classes
2017-11-01 14:05:14 +08:00
Maxime Coste
4ac0d35d1e
Regex: Add support for \K
that reset the start capture
2017-11-01 14:05:14 +08:00
Maxime Coste
2f450e0080
Regex: Add support for \Q...\E quoted parts
2017-11-01 14:05:14 +08:00
Maxime Coste
7a313ddafe
Regex: small error message improvement
2017-11-01 14:05:14 +08:00
Maxime Coste
c282b699d7
Regex: fix support for -
at end of a character class
2017-11-01 14:05:14 +08:00
Maxime Coste
e41d228af8
Regex: Disable dumping regex instructions by default in unit tests
2017-11-01 14:05:14 +08:00
Maxime Coste
d5048281a6
Regex: slight cleanup of the unit tests
2017-11-01 14:05:14 +08:00
Maxime Coste
f7468b576e
Regex: Refactor regex compilation to a regular RegexCompiler class
2017-11-01 14:05:14 +08:00
Maxime Coste
d5717edc9d
Regex: improve regex parse error reporting
...
Display the place where parsing failed, refactor code to make
RegexParser a regular object.
2017-11-01 14:05:14 +08:00
Maxime Coste
080160553c
Regex: support escaped character classes
2017-11-01 14:05:14 +08:00
Maxime Coste
1a8ad3759f
Regex: fix handling of strict quantifiers {N}
...
Previous behaviour was treating {N} as {N,}
2017-11-01 14:05:14 +08:00
Maxime Coste
be157453ad
Regex: Use a std::function based "Matcher" op to implement character classes
...
This is more extensible and should allow easier support for non ranges
classes.
2017-11-01 14:05:14 +08:00
Maxime Coste
eb1015cdfb
Regex: whenever Kakoune compiles a regex, pass it to the custom impl as well
...
That way we can see which features are missing.
2017-11-01 14:05:14 +08:00
Maxime Coste
002aba562f
Regex: work on unicode codepoints instead of raw bytes
2017-11-01 14:05:14 +08:00
Maxime Coste
75608ea223
Regex: when in full match mode, do not accept trailing data
2017-11-01 14:05:14 +08:00
Maxime Coste
490c130e41
Regex: Implement leftmost matching
...
Ensure threads are maintained in "priority" order, by having two
split instruction (prioritizing parent or child).
2017-11-01 14:05:14 +08:00
Maxime Coste
182b70cb0a
Regex: Add initial support for character ranges
2017-11-01 14:05:14 +08:00
Maxime Coste
52678fafa1
Regex: Add support for searching
...
Always compile a `.*` as the first instructions in a regex bytecode,
depending on the match or search mode, the RegexVM will either execute
this or skip it and start directly at the matching bytecode.
2017-11-01 14:05:14 +08:00
Maxime Coste
f7b8c1c79d
Regex: cleanup and reorganize regex code and improve capture support
...
Introduce the CompiledRegex class, rename ThreadedExecutor to
ThreadedRegexVM, remove the RegexProgram namespace.
2017-11-01 14:05:14 +08:00
Maxime Coste
023511deff
Regex: WIP support for saving captures
2017-11-01 14:05:14 +08:00