Commit Graph

9469 Commits

Author SHA1 Message Date
Maxime Coste
74ed102cab Regex: Tweak definition of character class and control escape tables 2017-11-01 14:05:14 +08:00
Maxime Coste
df73b71dfc Regex: fix lookarounds handling when computing starting chars 2017-11-01 14:05:14 +08:00
Maxime Coste
1c95074657 Make use of custom regex backward searching support for reverse search 2017-11-01 14:05:14 +08:00
Maxime Coste
785cd34b4b Regex: Make boost checking disableable at compile time 2017-11-01 14:05:14 +08:00
Maxime Coste
065bbc8f59 Regex: switch to custom impl, use boost for checking 2017-11-01 14:05:14 +08:00
Maxime Coste
9305fa1369 Regex: Fix lookaround use in moon.kak
(?=[A-Z]\w*) is strictly the same as (?=[A-Z]) as \w* will always
at least match an empty string.
2017-11-01 14:05:14 +08:00
Maxime Coste
cca730193c Regex: Support any char and character classes in lookarounds
Lookarounds still need to be fixed size, but accept character classes
as well as plain literals.
2017-11-01 14:05:14 +08:00
Maxime Coste
b8cb65160a Regex: use std::conditional instead of custom template class to choose Utf8It 2017-11-01 14:05:14 +08:00
Maxime Coste
db06acdfab Regex: Fix computation of potential starts for lookaheads 2017-11-01 14:05:14 +08:00
Maxime Coste
34b1f1ccb6 Regex: detect when all characters can start and avoid allocating 2017-11-01 14:05:14 +08:00
Maxime Coste
ea85f79384 Regex: add elided braces to fix compilation on older gcc 2017-11-01 14:05:14 +08:00
Maxime Coste
bf3b50a543 Regex: Fix wrong size of character_class_escapes array 2017-11-01 14:05:14 +08:00
Maxime Coste
08ea68dc1f Regex: Fix handling of match_prev_avail for boost regex
We were passing around iterators that were not allowed to
go before the begin iterator.
2017-11-01 14:05:14 +08:00
Maxime Coste
9ec376135b Regex: Introduce RegexExecFlags::PrevAvailable
Rework assertion code as well.
2017-11-01 14:05:14 +08:00
Maxime Coste
73e177ec59 Regex: Do not use sized deallocation to support more compilers 2017-11-01 14:05:14 +08:00
Maxime Coste
30dacdade2 Regex: deallocate Saves memory on ThreadedRegexVM destruction 2017-11-01 14:05:14 +08:00
Maxime Coste
578640c8a4 Regex: Fix handling of control escapes inside character classes 2017-11-01 14:05:14 +08:00
Maxime Coste
f3736a4b48 Regex: tag instructions as scheduled as well instead of searching
And a few more code cleanup in the ThreadedRegexVM
2017-11-01 14:05:14 +08:00
Maxime Coste
6bc5823745 Regex: refactor ThreadedRegexVM::exec_from code 2017-11-01 14:05:14 +08:00
Maxime Coste
4ff655cc09 Regex: store the processed flag directly in CompiledRegex instructions 2017-11-01 14:05:14 +08:00
Maxime Coste
732b8bc2a4 Regex: abandon bytecode and just use a simple list of instructions
Makes the code simpler.
2017-11-01 14:05:14 +08:00
Maxime Coste
6434bca325 Regex: Add some comments, remove supurious semicolons 2017-11-01 14:05:14 +08:00
Maxime Coste
911a893225 Regex: fix get_base(std::reverse_iterator<...>) returning a ref to temporary 2017-11-01 14:05:14 +08:00
Maxime Coste
11abd544c6 Regex: avoid infinite loops 2017-11-01 14:05:14 +08:00
Maxime Coste
c47cdc06a7 Regex: Add support for backward matching
Regex can be compiled for backward matching instead of forward matching
and the ThreadedRegexVM is able to iterate in reverse on the subject
string to find the last match instead of the first.
2017-11-01 14:05:14 +08:00
Maxime Coste
071b897e00 Regex: Remove static RegexCompiler::compile 2017-11-01 14:05:14 +08:00
Maxime Coste
52ee62172a Regex: remove use of buffer_utils.hh from regex_impl.cc 2017-11-01 14:05:14 +08:00
Maxime Coste
c375268c2d Regex: Use memcpy to write/read offsets from bytecode
reinterpret_cast was undefined behaviour as we do not guarantee
that offsets are going to be stored properly aligned.
2017-11-01 14:05:14 +08:00
Maxime Coste
b53227d62c Regex: slight cleanup of the unit tests 2017-11-01 14:05:14 +08:00
Maxime Coste
337e58d4f9 Regex: Cleanup character class parsing a bit 2017-11-01 14:05:14 +08:00
Maxime Coste
236751cb84 Regex: Make ThreadedRegexVM a proper class, define a proper interface 2017-11-01 14:05:14 +08:00
Maxime Coste
3b69dda04e Regex: Find potential start position using a map of valid start chars
With this optimization we get close to performance parity with boost
regex on the common use cases in Kakoune.
2017-11-01 14:05:14 +08:00
Maxime Coste
741772aef9 Regex: Optimize single char character classes as literals 2017-11-01 14:05:14 +08:00
Maxime Coste
fabeab1ee1 Regex: reorder lookaround ops, group by direction 2017-11-01 14:05:14 +08:00
Maxime Coste
854144c535 Regex: Fix handling of Save instruction in ThreadedRegexVM
When not saving, we were not fully reading the instruction stream,
leading to an out of sync instruction pointer.
2017-11-01 14:05:14 +08:00
Maxime Coste
f1b4931824 Regex: Fix handling of non capturing groups (?:...)
We were wrongly keeping the `:` as a literal content of the group
2017-11-01 14:05:14 +08:00
Maxime Coste
5f6e71c4dc Regex: More code tweaks and cleanups in ThreadedRegexVM 2017-11-01 14:05:14 +08:00
Maxime Coste
5f54e0de0e Regex: Code cleanup and refactor for Saves handling 2017-11-01 14:05:14 +08:00
Maxime Coste
dbb175841b Regex: do not write the search prefix inside the program bytecode
Its faster to have specialized code in the VM directly
2017-11-01 14:05:14 +08:00
Maxime Coste
cf5055f68b Regex: small code tweak 2017-11-01 14:05:14 +08:00
Maxime Coste
e0fac20f6c Regex: Use a custom allocated buffer for Saves instead of a Vector 2017-11-01 14:05:14 +08:00
Maxime Coste
1399563e40 Regex: make m_current_threads and m_next_threads local variable of exec 2017-11-01 14:05:14 +08:00
Maxime Coste
54da8098ae Regex: Add a NoSaves RegexExecFlags to disable saving positions 2017-11-01 14:05:14 +08:00
Maxime Coste
119bc38254 Regex: small refactor of ThreadedRegexVM::clone_saves 2017-11-01 14:05:14 +08:00
Maxime Coste
9fbafba4cb Regex: Refactor thread handling in ThreadedRegexVM 2017-11-01 14:05:14 +08:00
Maxime Coste
589cde67f0 Regex: store saves in a copy on write structure 2017-11-01 14:05:14 +08:00
Maxime Coste
11b9c996ea Regex: small code style tweak 2017-11-01 14:05:14 +08:00
Maxime Coste
51ad8b4c85 Regex: fix handling of negative escaped character classes 2017-11-01 14:05:14 +08:00
Maxime Coste
adcd02b7d2 Regex: Replace boost regex_iterator impl with our own
Ensure we check the results from our own regex impl in all uses of
regexs in Kakoune.
2017-11-01 14:05:14 +08:00
Maxime Coste
f007794d9c Regex: introduce RegexExecFlags to control various behaviours 2017-11-01 14:05:14 +08:00