Commit Graph

4167 Commits

Author SHA1 Message Date
Maxime Coste
2c6c0be0c1 Regex: abort compilation as soon as we hit the instruction count limit 2017-11-01 14:05:15 +08:00
Maxime Coste
d44e160aa7 Regex: add a unit test for why lookaheads dont count for start chars anymore 2017-11-01 14:05:15 +08:00
Maxime Coste
87eec79d07 Regex: comment the mutables in CompiledRegex::Instruction and fix their init 2017-11-01 14:05:14 +08:00
Maxime Coste
8b2297f5ca Regex: Introduce a Regex memory domain to track usage separately 2017-11-01 14:05:14 +08:00
Maxime Coste
9ec175f2f8 Regex: use binary search to for character class ranges check 2017-11-01 14:05:14 +08:00
Maxime Coste
6e65589a34 Regex: compute start chars from matchers, do not compute it from lookarounds
Computing potential start characters from lookarounds is more complex
than expected, and not worth the complexity.
2017-11-01 14:05:14 +08:00
Maxime Coste
621b0d3ab8 Regex: remove the need to a processed inst vector
Identify each step with a counter, and check if the instruction
was already processed this step. This makes the matching faster,
by removing the need to maintain a vector of instructions executed
this step.
2017-11-01 14:05:14 +08:00
Maxime Coste
cfc52d7e6a Regex: use intrusive linked list for the free saves instead of a Vector 2017-11-01 14:05:14 +08:00
Maxime Coste
df16fea82d Regex: rename "flags" with the more common "modifiers" 2017-11-01 14:05:14 +08:00
Maxime Coste
52d443f764 Regex: Correctly handle ignore case mode for start chars computation 2017-11-01 14:05:14 +08:00
Maxime Coste
b8495f0953 Regex: Rework parsing, treat lookarounds as assertions, and flags separately 2017-11-01 14:05:14 +08:00
Maxime Coste
b0233262b8 Regex: Limit programs to std::numeric_limits<uint16_t>::max() instructions 2017-11-01 14:05:14 +08:00
Maxime Coste
8c8dcb3a84 Regex: Fix reverse searching behaviour, again 2017-11-01 14:05:14 +08:00
Maxime Coste
9753bcd0ad Regex: limit explicit quantifiers value (too 1000 for now)
Fixes #1628
2017-11-01 14:05:14 +08:00
Maxime Coste
2b97e4e124 Regex: Fix handling of ^ and $ in backward matching mode 2017-11-01 14:05:14 +08:00
Maxime Coste
3c999aba37 Regex: Only reset processed and scheduled flags on relevant instructions
On big regex, reseting all those flags on all instructions for each
character can become the dominant operation. Track that actual
instructions index processed (the scheduled are already tracked in
the next_threads vector), and only reset these.
2017-11-01 14:05:14 +08:00
Maxime Coste
5bf4be645a Regex: Fix support for ignore case in lookarounds 2017-11-01 14:05:14 +08:00
Maxime Coste
80f6caee81 Regex: move try/catch blocks inside boost specific code 2017-11-01 14:05:14 +08:00
Maxime Coste
dd9e43e6f9 Regex: small code cleanup 2017-11-01 14:05:14 +08:00
Maxime Coste
23b3a221eb Regex: support more than two children in alternations
Avoid deep nested alternations, parse them flattened.
2017-11-01 14:05:14 +08:00
Maxime Coste
fb5243f710 Regex: print instruction index in dump_regex 2017-11-01 14:05:14 +08:00
Maxime Coste
c8966ca701 Regex: Assert that the regex direction matches the vm direction 2017-11-01 14:05:14 +08:00
Maxime Coste
74ed102cab Regex: Tweak definition of character class and control escape tables 2017-11-01 14:05:14 +08:00
Maxime Coste
df73b71dfc Regex: fix lookarounds handling when computing starting chars 2017-11-01 14:05:14 +08:00
Maxime Coste
1c95074657 Make use of custom regex backward searching support for reverse search 2017-11-01 14:05:14 +08:00
Maxime Coste
785cd34b4b Regex: Make boost checking disableable at compile time 2017-11-01 14:05:14 +08:00
Maxime Coste
065bbc8f59 Regex: switch to custom impl, use boost for checking 2017-11-01 14:05:14 +08:00
Maxime Coste
9305fa1369 Regex: Fix lookaround use in moon.kak
(?=[A-Z]\w*) is strictly the same as (?=[A-Z]) as \w* will always
at least match an empty string.
2017-11-01 14:05:14 +08:00
Maxime Coste
cca730193c Regex: Support any char and character classes in lookarounds
Lookarounds still need to be fixed size, but accept character classes
as well as plain literals.
2017-11-01 14:05:14 +08:00
Maxime Coste
b8cb65160a Regex: use std::conditional instead of custom template class to choose Utf8It 2017-11-01 14:05:14 +08:00
Maxime Coste
db06acdfab Regex: Fix computation of potential starts for lookaheads 2017-11-01 14:05:14 +08:00
Maxime Coste
34b1f1ccb6 Regex: detect when all characters can start and avoid allocating 2017-11-01 14:05:14 +08:00
Maxime Coste
ea85f79384 Regex: add elided braces to fix compilation on older gcc 2017-11-01 14:05:14 +08:00
Maxime Coste
bf3b50a543 Regex: Fix wrong size of character_class_escapes array 2017-11-01 14:05:14 +08:00
Maxime Coste
08ea68dc1f Regex: Fix handling of match_prev_avail for boost regex
We were passing around iterators that were not allowed to
go before the begin iterator.
2017-11-01 14:05:14 +08:00
Maxime Coste
9ec376135b Regex: Introduce RegexExecFlags::PrevAvailable
Rework assertion code as well.
2017-11-01 14:05:14 +08:00
Maxime Coste
73e177ec59 Regex: Do not use sized deallocation to support more compilers 2017-11-01 14:05:14 +08:00
Maxime Coste
30dacdade2 Regex: deallocate Saves memory on ThreadedRegexVM destruction 2017-11-01 14:05:14 +08:00
Maxime Coste
578640c8a4 Regex: Fix handling of control escapes inside character classes 2017-11-01 14:05:14 +08:00
Maxime Coste
f3736a4b48 Regex: tag instructions as scheduled as well instead of searching
And a few more code cleanup in the ThreadedRegexVM
2017-11-01 14:05:14 +08:00
Maxime Coste
6bc5823745 Regex: refactor ThreadedRegexVM::exec_from code 2017-11-01 14:05:14 +08:00
Maxime Coste
4ff655cc09 Regex: store the processed flag directly in CompiledRegex instructions 2017-11-01 14:05:14 +08:00
Maxime Coste
732b8bc2a4 Regex: abandon bytecode and just use a simple list of instructions
Makes the code simpler.
2017-11-01 14:05:14 +08:00
Maxime Coste
6434bca325 Regex: Add some comments, remove supurious semicolons 2017-11-01 14:05:14 +08:00
Maxime Coste
911a893225 Regex: fix get_base(std::reverse_iterator<...>) returning a ref to temporary 2017-11-01 14:05:14 +08:00
Maxime Coste
11abd544c6 Regex: avoid infinite loops 2017-11-01 14:05:14 +08:00
Maxime Coste
c47cdc06a7 Regex: Add support for backward matching
Regex can be compiled for backward matching instead of forward matching
and the ThreadedRegexVM is able to iterate in reverse on the subject
string to find the last match instead of the first.
2017-11-01 14:05:14 +08:00
Maxime Coste
071b897e00 Regex: Remove static RegexCompiler::compile 2017-11-01 14:05:14 +08:00
Maxime Coste
52ee62172a Regex: remove use of buffer_utils.hh from regex_impl.cc 2017-11-01 14:05:14 +08:00
Maxime Coste
c375268c2d Regex: Use memcpy to write/read offsets from bytecode
reinterpret_cast was undefined behaviour as we do not guarantee
that offsets are going to be stored properly aligned.
2017-11-01 14:05:14 +08:00
Maxime Coste
b53227d62c Regex: slight cleanup of the unit tests 2017-11-01 14:05:14 +08:00
Maxime Coste
337e58d4f9 Regex: Cleanup character class parsing a bit 2017-11-01 14:05:14 +08:00
Maxime Coste
236751cb84 Regex: Make ThreadedRegexVM a proper class, define a proper interface 2017-11-01 14:05:14 +08:00
Maxime Coste
3b69dda04e Regex: Find potential start position using a map of valid start chars
With this optimization we get close to performance parity with boost
regex on the common use cases in Kakoune.
2017-11-01 14:05:14 +08:00
Maxime Coste
741772aef9 Regex: Optimize single char character classes as literals 2017-11-01 14:05:14 +08:00
Maxime Coste
fabeab1ee1 Regex: reorder lookaround ops, group by direction 2017-11-01 14:05:14 +08:00
Maxime Coste
854144c535 Regex: Fix handling of Save instruction in ThreadedRegexVM
When not saving, we were not fully reading the instruction stream,
leading to an out of sync instruction pointer.
2017-11-01 14:05:14 +08:00
Maxime Coste
f1b4931824 Regex: Fix handling of non capturing groups (?:...)
We were wrongly keeping the `:` as a literal content of the group
2017-11-01 14:05:14 +08:00
Maxime Coste
5f6e71c4dc Regex: More code tweaks and cleanups in ThreadedRegexVM 2017-11-01 14:05:14 +08:00
Maxime Coste
5f54e0de0e Regex: Code cleanup and refactor for Saves handling 2017-11-01 14:05:14 +08:00
Maxime Coste
dbb175841b Regex: do not write the search prefix inside the program bytecode
Its faster to have specialized code in the VM directly
2017-11-01 14:05:14 +08:00
Maxime Coste
cf5055f68b Regex: small code tweak 2017-11-01 14:05:14 +08:00
Maxime Coste
e0fac20f6c Regex: Use a custom allocated buffer for Saves instead of a Vector 2017-11-01 14:05:14 +08:00
Maxime Coste
1399563e40 Regex: make m_current_threads and m_next_threads local variable of exec 2017-11-01 14:05:14 +08:00
Maxime Coste
54da8098ae Regex: Add a NoSaves RegexExecFlags to disable saving positions 2017-11-01 14:05:14 +08:00
Maxime Coste
119bc38254 Regex: small refactor of ThreadedRegexVM::clone_saves 2017-11-01 14:05:14 +08:00
Maxime Coste
9fbafba4cb Regex: Refactor thread handling in ThreadedRegexVM 2017-11-01 14:05:14 +08:00
Maxime Coste
589cde67f0 Regex: store saves in a copy on write structure 2017-11-01 14:05:14 +08:00
Maxime Coste
11b9c996ea Regex: small code style tweak 2017-11-01 14:05:14 +08:00
Maxime Coste
51ad8b4c85 Regex: fix handling of negative escaped character classes 2017-11-01 14:05:14 +08:00
Maxime Coste
adcd02b7d2 Regex: Replace boost regex_iterator impl with our own
Ensure we check the results from our own regex impl in all uses of
regexs in Kakoune.
2017-11-01 14:05:14 +08:00
Maxime Coste
f007794d9c Regex: introduce RegexExecFlags to control various behaviours 2017-11-01 14:05:14 +08:00
Maxime Coste
73b14b11be Regex: small code tweak in ThreadedRegexVM 2017-11-01 14:05:14 +08:00
Maxime Coste
630d078b6d Regex: Fix use of not-yet-constructed CompiledRegex in TestVM impl 2017-11-01 14:05:14 +08:00
Maxime Coste
5b0c2cbdc2 Regex: Ensure we dont have a thread explosion in ThreadedRegexVM
Always remove threads with lower priority that end up on the same
instruction as a higher priority thread (as we know they will behave
the same from now on)
2017-11-01 14:05:14 +08:00
Maxime Coste
b4f923b7fc Regex: min/max quantifiers can be non greedy as well 2017-11-01 14:05:14 +08:00
Maxime Coste
f02b2645da Regex: validate that our custom impl gets the same results as boost regex
In addition to running boost regex, run our custom regex and compare
the results to ensure the two regex engine agree.
2017-11-01 14:05:14 +08:00
Maxime Coste
76dcfd5c52 Regex: support escaping characters in character classes 2017-11-01 14:05:14 +08:00
Maxime Coste
3d2262bebf Regex: add support for case insensitive matching, controlled by (?i) 2017-11-01 14:05:14 +08:00
Maxime Coste
7673781751 Regex: use \A \z for subject start/end
This is the most common syntax in various regex variants.
2017-11-01 14:05:14 +08:00
Maxime Coste
0bdfdac5c5 Regex: Implement lookarounds for fixed literal strings
We do not support anything else than a plain literal string for
lookarounds.
2017-11-01 14:05:14 +08:00
Maxime Coste
e96cd29f0e Regex: Support non greedy quantifiers 2017-11-01 14:05:14 +08:00
Maxime Coste
e4004a7b7f Regex: Add support for \h and \H "horizontal blank" character classes 2017-11-01 14:05:14 +08:00
Maxime Coste
4ac0d35d1e Regex: Add support for \K that reset the start capture 2017-11-01 14:05:14 +08:00
Maxime Coste
2f450e0080 Regex: Add support for \Q...\E quoted parts 2017-11-01 14:05:14 +08:00
Maxime Coste
7a313ddafe Regex: small error message improvement 2017-11-01 14:05:14 +08:00
Maxime Coste
c282b699d7 Regex: fix support for - at end of a character class 2017-11-01 14:05:14 +08:00
Maxime Coste
e41d228af8 Regex: Disable dumping regex instructions by default in unit tests 2017-11-01 14:05:14 +08:00
Maxime Coste
d5048281a6 Regex: slight cleanup of the unit tests 2017-11-01 14:05:14 +08:00
Maxime Coste
f7468b576e Regex: Refactor regex compilation to a regular RegexCompiler class 2017-11-01 14:05:14 +08:00
Maxime Coste
d5717edc9d Regex: improve regex parse error reporting
Display the place where parsing failed, refactor code to make
RegexParser a regular object.
2017-11-01 14:05:14 +08:00
Maxime Coste
080160553c Regex: support escaped character classes 2017-11-01 14:05:14 +08:00
Maxime Coste
1a8ad3759f Regex: fix handling of strict quantifiers {N}
Previous behaviour was treating {N} as {N,}
2017-11-01 14:05:14 +08:00
Maxime Coste
be157453ad Regex: Use a std::function based "Matcher" op to implement character classes
This is more extensible and should allow easier support for non ranges
classes.
2017-11-01 14:05:14 +08:00
Maxime Coste
eb1015cdfb Regex: whenever Kakoune compiles a regex, pass it to the custom impl as well
That way we can see which features are missing.
2017-11-01 14:05:14 +08:00
Maxime Coste
002aba562f Regex: work on unicode codepoints instead of raw bytes 2017-11-01 14:05:14 +08:00
Maxime Coste
75608ea223 Regex: when in full match mode, do not accept trailing data 2017-11-01 14:05:14 +08:00
Maxime Coste
490c130e41 Regex: Implement leftmost matching
Ensure threads are maintained in "priority" order, by having two
split instruction (prioritizing parent or child).
2017-11-01 14:05:14 +08:00
Maxime Coste
182b70cb0a Regex: Add initial support for character ranges 2017-11-01 14:05:14 +08:00
Maxime Coste
52678fafa1 Regex: Add support for searching
Always compile a `.*` as the first instructions in a regex bytecode,
depending on the match or search mode, the RegexVM will either execute
this or skip it and start directly at the matching bytecode.
2017-11-01 14:05:14 +08:00
Maxime Coste
f7b8c1c79d Regex: cleanup and reorganize regex code and improve capture support
Introduce the CompiledRegex class, rename ThreadedExecutor to
ThreadedRegexVM, remove the RegexProgram namespace.
2017-11-01 14:05:14 +08:00
Maxime Coste
023511deff Regex: WIP support for saving captures 2017-11-01 14:05:14 +08:00
Maxime Coste
ad546e516a Regex: Small comment tweaks 2017-11-01 14:05:14 +08:00
Maxime Coste
46a113e10a Regex: Add support for curly braces count expressions 2017-11-01 14:05:14 +08:00
Maxime Coste
d04c60b911 Regex: Add support for subject begin/end assertion (\` and \') 2017-11-01 14:05:14 +08:00
Maxime Coste
9c5d539616 Regex: Add word boundary assertion support 2017-11-01 14:05:14 +08:00
Maxime Coste
a9a04e81b0 Regex: Ensure we only ever have a single thread on a given instruction 2017-11-01 14:05:14 +08:00
Maxime Coste
ee42c6b0ba Regex: add unit test to check the ".*" construct 2017-11-01 14:05:14 +08:00
Maxime Coste
4010c44fc0 Regex: Make the Split op only take a single offset parameter
Split now creates a new thread and keep the current one running, as
all of its uses are compatible with this behaviour, which enable a
more compact compiled code.
2017-11-01 14:05:14 +08:00
Maxime Coste
f9dc6774b9 Regex: Introduce RegexProgram::ThreadedExecutor and add line end/begin impl 2017-11-01 14:05:14 +08:00
Maxime Coste
a448e1e222 Regex: Code cleanup in the regex impl 2017-11-01 14:05:14 +08:00
Maxime Coste
8c9976ea72 Regex: Add initial, exploratory work on a custom regex engine 2017-11-01 14:05:14 +08:00
Maxime Coste
797a0cb062 Add another assert to try to catch #1506 2017-11-01 14:04:42 +08:00
Maxime Coste
94a0c9bb45 Highlighters does not need to inherit from HighlighterGroup
Just compose, to avoid coupling Highlighters with the Highlighter
interface. And yeah, that naming is a bit confusing.
2017-10-31 13:53:08 +08:00
Maxime Coste
6272847ace Prompt: display the fallback text everytime the prompt is empty 2017-10-31 12:54:21 +11:00
Maxime Coste
6d78b06405 Do not auto apply the fallback regex when in regex prompts
Fixes #1653
2017-10-30 18:58:47 +11:00
Maxime Coste
cd215ccee9 Do not allow opening files whose size we cannot express in an int 2017-10-30 18:58:47 +11:00
Maxime Coste
40eb598065 Makefile: Use pkg-config on Linux to get the ncurses compilation flags
Fixes #1659
2017-10-30 17:35:51 +11:00
Maxime Coste
43d470f286 Slight cleanup of select_surrounding implementation 2017-10-28 13:43:04 +08:00
Maxime Coste
7064e890f5 Update breaking changes message 2017-10-28 13:43:04 +08:00
Maxime Coste
d49555fc75 Move highlighters into Scopes
That means we can now have highlighters active at global, buffer, and
window scope. The add-highlighter and remove-highlighter syntax changed
to take the parent path (scope/group/...) as a mandatory argument,
superseeding the previous -group switch.
2017-10-28 13:43:04 +08:00
Maxime Coste
9a449a3344 Display the fallback value in prompts
Fixes #1654
2017-10-28 10:07:28 +08:00
Maxime Coste
7062022187 HashMap: Tolerate reserving for 0 elements
Fixes #1652
2017-10-27 11:03:43 +08:00
Maxime Coste
75767f5cb5 Fix infinite loop shell_complete
Fixes #1648
2017-10-25 11:26:03 +08:00
Maxime Coste
ab9283bc37 Merge remote-tracking branch 'net/master' 2017-10-25 11:13:42 +08:00
Net
74202fab45 Rename br* colors to bright-* 2017-10-24 23:08:22 -04:00
Maxime Coste
654e3fcb46 Fix regions highlighter infinite loops when regex matches empty ranges 2017-10-25 10:39:35 +08:00
Delapouite
d5b6669a83 Add distinct w (curr buf) / W (all buf) word completion for <c-x> 2017-10-24 22:47:43 +02:00
Net
2b44e93f79 Support bright named colors 2017-10-22 14:30:49 -04:00
Maxime Coste
600ba45189 Add missing include to meta.hh 2017-10-21 05:30:43 +08:00
Maxime Coste
d6cb10d693 Disable constexpr keymap as it breaks compilation with gcc 5 2017-10-20 19:12:21 +08:00
Maxime Coste
723bb2b175 Merge remote-tracking branch 'fsub/master' 2017-10-20 17:28:06 +08:00
Maxime Coste
7c06667bdf Make the normal mode keymap a compile time hash map
This hash map is now fully constexpr, and ends up stored in the read
only data segment instead of being recomputed at each startup.
2017-10-20 12:21:22 +08:00
Maxime Coste
d486ea84e5 Constexprify various hash functions 2017-10-20 12:21:22 +08:00
Maxime Coste
ddff35e5ab Move keymap as an implementation detail of the normal mode keys
Only expose a free function that tries to get the NormalCmd from a
key.
2017-10-20 12:21:22 +08:00
fsub
a70128a4cf Avoid some warnings in optimized builds 2017-10-19 22:20:44 +02:00
Maxime Coste
ddc307b8e9 Optimize CommandManager::execute handling of tokens
Instead of walking a list of tokens and inserting eventual new
ones in the middle, use a stack of token and push new ones on top.
2017-10-17 10:25:20 +08:00
Maxime Coste
145cf843dd Add a fail command to explicitely raise an error 2017-10-17 10:25:16 +08:00
Maxime Coste
89f016d871 Refactor column highlighter to make it more robust
Support arbitrary orders for column highlighters (it was previously
failing when column highlighters were not applied in column order).

Fix show_matching tab handling at the same time (horizontal scrolling,
tab characters and show_matching were behaving badly).

Window highlighting now runs user highlighters, then built-ins for each
phases, instead of running all phases for user highlighters, then all
phases for built-ins.

We now consider unprintable character to be 1-column width as we know
we will display them as "�".

Fixes #1615
Fixes #1023
2017-10-12 14:46:15 +08:00
Maxime Coste
78d7d512cb Fix utf8::to_previous that could go before the begin iterator 2017-10-10 10:53:24 +08:00
Maxime Coste
079cfbc6ac Remove unused forward declaration 2017-10-10 10:52:58 +08:00
Maxime Coste
6ada6e6d77 Move all non-core string code to string_utils.{hh,cc} 2017-10-10 10:52:32 +08:00
Maxime Coste
d1b9c24afc Make Server outlive buffer manager
Fixes crashes when trying to access the server to get the session
on hooks run during destruction of other managers.

Fixes #1622
2017-10-10 10:49:30 +08:00
Maxime Coste
80d2506c34 Make utf8_iterator traits clear about it returning non-references 2017-10-07 21:54:59 +08:00
Maxime Coste
e18836aea7 Add is_upper and is_lower helper unicode functions 2017-10-07 21:54:55 +08:00
Maxime Coste
ca17fbbeb9 Merge remote-tracking branch 'Delapouite/docs-scroll' 2017-10-07 21:51:37 +08:00
Maxime Coste
a5ae21d70d Move HookManager::Hook definition in the cpp
This avoids including regex.hh in the header.
2017-10-06 13:58:04 +08:00
Maxime Coste
18705a0097 Add missing operator+= and -= on utf8_iterator
Fix operator== and != that were non-const as well.
2017-10-06 13:57:54 +08:00
Maxime Coste
cbb6e9ea0f Merge remote-tracking branch 'Delapouite/client_list' 2017-10-06 13:53:55 +08:00
Maxime Coste
8900d06646 Merge remote-tracking branch 'Delapouite/complete-line' 2017-10-06 13:50:42 +08:00