kakoune

Author	SHA1	Message	Date
Maxime Coste	ef3419edbf	Do not pass thread to failed/consumed, capture it implicitely	2018-12-19 19:16:14 +11:00
Maxime Coste	0b9f782691	Take iterators by const-ref in ThreadedRegexVM::exec	2018-12-19 19:14:42 +11:00
Maxime Coste	021ba55b38	Small code tweak in DualThreadStack::swap_next	2018-11-14 17:50:17 +11:00
Maxime Coste	8c2c3d27ad	Fix memory leak in DualThreadStack Fixes #2556	2018-11-07 12:28:41 +11:00
Maxime Coste	7f83c41256	align ThreadedRegexVM::Thread to permit fused copy optimization Aligning makes gcc able to copy a Thread object with a single 32bit mov instruction instead of two 16bits one.	2018-11-06 20:13:09 +11:00
Maxime Coste	05a9eb62f4	Never grow the DualThreadStack in push_next As we do at most one push_next per step_thread, and we pop_current before step_thread, we can avoid a branch there at the expense of sometimes growing unecessarily (once).	2018-11-06 07:32:47 +11:00
Maxime Coste	7fbde0d44e	Various micro performance tweaks in ThreadedRegexVM	2018-11-05 21:54:29 +11:00
Maxime Coste	7959c7f731	Refactor ThreadedRegexVM::exec_program to avoid branching Moving logic into step_thread instead of returning an enum to select what to run avoids the switch logic and improves run time.	2018-11-05 19:46:53 +11:00
Maxime Coste	7463a0d449	Remove use of utf8::iterator in regex execution This avoids having two copies of the subject string bounds, one in the ExecConfig and one in the utf8 iterator.	2018-11-05 08:17:50 +11:00
Maxime Coste	4ac7df3842	Remove most regex impl special casing for backwards matching	2018-11-03 13:52:40 +11:00
Maxime Coste	ee74c2c2df	Use custom code instead of reverse_iterator in Regex VM	2018-11-02 08:23:39 +11:00
Maxime Coste	6fce8050ee	Use BufferCoord sentinel type for regex matching on BufferIterators BufferIterators are large-ish, and need to check the buffer pointer on comparison. Checking against a coord is just a 64 bit comparison.	2018-11-01 21:51:10 +11:00
Maxime Coste	4cd7583bbc	Improve regex vm to next start performance by avoiding iterator copies	2018-11-01 08:22:43 +11:00
Maxime Coste	d652ec9ce1	Cleanup regex lookarounds implementation and reject incompatible regex Fixes #2487	2018-10-10 22:47:59 +11:00
Maxime Coste	9024d41d64	Fix integer overflow leading to bad memory access in regex execution Fixes #2481 Fixes #2480	2018-10-08 12:43:12 +11:00
Maxime Coste	7cf3cbde8e	Cleanup some trailing whitespaces and double semicolon	2018-07-26 21:56:34 +10:00
Maxime Coste	0d6e04257b	Fix memory leak in regex execution	2018-07-25 20:57:11 +10:00
Maxime Coste	7ed5d53fe6	Fix RegexCompileFlags::Backwards having the same value as Optimize That means every Optimized regex had the Backwards version compiled as well, which doubled the time it took to compile them and doubled the memory usage of regex. This should improve #2152	2018-07-19 18:34:40 +10:00
Olivier Perret	67655de947	Use a dedicated vm op for dot when match-newline is false	2018-06-24 12:41:50 +02:00
Maxime Coste	787ca7f19b	Regex: small code style tweak	2018-04-29 19:58:18 +10:00
Maxime Coste	1e8026f143	Regex: Use only 128 characters in start desc and encode others as 0 Using 257 was using lots of memory for no good reason, as > 127 codepoint are not common enough to be treated specially.	2018-04-29 19:58:18 +10:00
Maxime Coste	528ecb7417	Regex: Use a custom 'DualThreadStack' structure to hold thread info Instead of using two vectors, we can hold both current and next threads in a single buffer, with stacks growing on each end. Benchmarking shows this to be slightly faster, and should use less memory.	2018-04-29 19:58:18 +10:00
Maxime Coste	8438b33175	Add a debug regex command to dump regex instructions	2018-04-27 08:35:09 +10:00
Maxime Coste	f10eb9faa3	Use indices instead of pointers for saves/instruction in ThreadedRegexVM Performance seems unaffacted, but memory usage should be lowered as the Thread struct is 4 bytes instead of 16.	2018-04-27 08:35:09 +10:00
Maxime Coste	fa17c46653	Regex: Refactor ThreadedRegexVM state handling Remove ExecState to store threads inside the ThreadedRegexVM so that memory buffers can be reused between executions. Extract an ExecConfig struct with all the data thats execution specific to avoid storing it needlessly inside the ThreadedRegexVM.	2018-04-25 21:19:04 +10:00
Maxime Coste	fb65fa60f8	Regex: take the full subject range as a parameter To allow more general look arounds out of the actual search range, pass a second range (the actual subject). This allows us to remove various flags such as PrevAvailable or NotBeginOfSubject, which are now easy to check from the subject range. Fixes #1902	2018-03-05 05:48:10 +11:00
Maxime Coste	d9e44dfacf	Regex: Remove helper functions from regex_impl.hh They were close duplicates from the ones in regex.hh and not used anywhere else.	2018-03-05 03:10:47 +11:00
Maxime Coste	933ac4d3d5	Regex: Improve comments and constify some variables Reword various comments to make some tricky parts of the regex engine easier to understand.	2018-02-24 17:40:08 +11:00
Maxime Coste	af21d4ca1e	regex: track CompiledRegex::StartDesc in the Regex memory domain	2018-02-24 16:29:24 +11:00
Maxime Coste	6851604546	Regex: Add a RegexExecFlags::NotEndOfSubject flag	2017-12-29 09:55:38 +11:00
Maxime Coste	413f880e9e	Regex: Support forward and backward matching code in the same CompiledRegex No need to have two separate regexes to handle forward and backward matching, just passing RegexCompileFlags::Backward will add support for backward matching to the regex. For backward only regex, pass RegexCompileFlags::NoForward as well to disable generation of forward matching code.	2017-12-01 19:57:02 +08:00
Maxime Coste	8d892eeb62	Regex: use StartDesc to early out when not searching Early out as well if we do not find any potential start position.	2017-12-01 15:03:03 +08:00
Maxime Coste	65b057f261	Regex: rename StartChars to StartDesc It only contains chars for now, but its still more generally describing where matches can start.	2017-12-01 14:46:18 +08:00
Maxime Coste	a52da6fe34	Regex: Tweak is_ctype implementation style	2017-11-28 00:13:42 +08:00
Maxime Coste	8b40f57145	Regex: Replace generic 'Matchers' with specialized functionality Introduce CharacterClass and CharacterType Regex Op, and optimize their evaluation.	2017-11-25 18:14:15 +08:00
Maxime Coste	5cfccad39c	Regex: Use MemoryDomain::Regex for captures and MatchResults contents	2017-11-12 12:30:21 +08:00
Maxime Coste	c9b43d3634	Regex: directly store instruction pointer in Thread struct	2017-11-11 15:15:13 +08:00
Maxime Coste	c74becc6af	Regex: fix RegexCompileFlags not being an enum class	2017-11-01 14:05:15 +08:00
Maxime Coste	2d901dc76f	Regex: slight readability improvement and workaround a potential gcc bug	2017-11-01 14:05:15 +08:00
Maxime Coste	9e15207d2a	Regex: put the other char boolean inside the general start char map	2017-11-01 14:05:15 +08:00
Maxime Coste	e9e9a08e7b	Regex: refactor handling of Saves slightly, do not create them until really needed	2017-11-01 14:05:15 +08:00
Maxime Coste	d9b4076e3c	Regex: Go back to instruction based search of next start The previous method, which was a bit faster in the general use case, can hit some cases where we get quadratic behaviour and very slow matching. By using an instruction, we can guarantee our complexity of O(NM) as we will never have more than N threads (N being the instruction count) and we run the threads once per codepoint in the subject string. That slows down the general case slightly, but ensure we dont have pathological cases. This new version is much faster than the previous instruction based search because it does not use a plain `.` searcher, but a specific, smarter instruction specialized for finding the next start if we are in the correct conditions.	2017-11-01 14:05:15 +08:00
Maxime Coste	c423b47109	Regex: compute if codepoints outside of the start chars map can start	2017-11-01 14:05:15 +08:00
Maxime Coste	87eec79d07	Regex: comment the mutables in CompiledRegex::Instruction and fix their init	2017-11-01 14:05:14 +08:00
Maxime Coste	8b2297f5ca	Regex: Introduce a Regex memory domain to track usage separately	2017-11-01 14:05:14 +08:00
Maxime Coste	621b0d3ab8	Regex: remove the need to a processed inst vector Identify each step with a counter, and check if the instruction was already processed this step. This makes the matching faster, by removing the need to maintain a vector of instructions executed this step.	2017-11-01 14:05:14 +08:00
Maxime Coste	cfc52d7e6a	Regex: use intrusive linked list for the free saves instead of a Vector	2017-11-01 14:05:14 +08:00
Maxime Coste	b0233262b8	Regex: Limit programs to std::numeric_limits<uint16_t>::max() instructions	2017-11-01 14:05:14 +08:00
Maxime Coste	2b97e4e124	Regex: Fix handling of ^ and $ in backward matching mode	2017-11-01 14:05:14 +08:00
Maxime Coste	3c999aba37	Regex: Only reset processed and scheduled flags on relevant instructions On big regex, reseting all those flags on all instructions for each character can become the dominant operation. Track that actual instructions index processed (the scheduled are already tracked in the next_threads vector), and only reset these.	2017-11-01 14:05:14 +08:00

1 2

93 Commits