The ThreadedRegexVM implementation does not execute split opcodes as
expected: on split the pending thread is pushed on top of the thread
stack, which means that when multiple splits are executed in a row
(such as with a disjunction with 3 or more branches) the last split
target gets on top of the thread stack and gets executed next (when the
thread from the first split target would be the expected one)
Fixing this in the ThreadedRegexVM would have a performance impact as
we would not be able to use a plain stack for current threads, so the
best solution at the moment is to reverse the order of splits generated
by a disjunction.
Fixes#4519
Merge all lookarounds into the same instruction, merge splits, merge
literal ignore case with literal...
Besides reducing the amount of almost duplicated code, this improves
performance by reducing pressure on the (often failing) branch target
prediction for instruction dispatching by moving branches into the
instruction code themselves where they are more likely to be well
predicted.
file.cc:390:21: error: use of undeclared identifier 'rename'; did you mean 'devname'?
if (replace and rename(temp_filename, zfilename) != 0)
^~~~~~
devname
/usr/include/stdlib.h:277:7: note: 'devname' declared here
char *devname(__dev_t, __mode_t);
^
file.cc:390:28: error: cannot initialize a parameter of type '__dev_t' (aka 'unsigned long') with an lvalue of type 'char [1024]'
if (replace and rename(temp_filename, zfilename) != 0)
^~~~~~~~~~~~~
/usr/include/stdlib.h:277:22: note: passing argument to parameter here
char *devname(__dev_t, __mode_t);
^
2 errors generated.
---
highlighters.cc:1110:13: error: use of undeclared identifier 'snprintf'; did you mean 'vswprintf'?
snprintf(buffer, 16, format, std::abs(line_to_format));
^~~~~~~~
vswprintf
/usr/include/wchar.h:139:5: note: 'vswprintf' declared here
int vswprintf(wchar_t * __restrict, size_t n, const wchar_t * __restrict,
^
highlighters.cc:1110:22: error: cannot initialize a parameter of type 'wchar_t *' with an lvalue of type 'char [16]'
snprintf(buffer, 16, format, std::abs(line_to_format));
^~~~~~
/usr/include/wchar.h:139:35: note: passing argument to parameter here
int vswprintf(wchar_t * __restrict, size_t n, const wchar_t * __restrict,
^
2 errors generated.
---
json_ui.cc:60:13: error: use of undeclared identifier 'sprintf'; did you mean 'swprintf'?
sprintf(buf, "\\u%04x", *next);
^~~~~~~
swprintf
/usr/include/wchar.h:133:5: note: 'swprintf' declared here
int swprintf(wchar_t * __restrict, size_t n, const wchar_t * __restrict,
^
json_ui.cc:60:21: error: cannot initialize a parameter of type 'wchar_t *' with an lvalue of type 'char [7]'
sprintf(buf, "\\u%04x", *next);
^~~
/usr/include/wchar.h:133:34: note: passing argument to parameter here
int swprintf(wchar_t * __restrict, size_t n, const wchar_t * __restrict,
^
json_ui.cc:74:9: error: use of undeclared identifier 'sprintf'
sprintf(buffer, R"("#%02x%02x%02x")", color.r, color.g, color.b);
^
3 errors generated.
---
regex_impl.cc:1039:9: error: use of undeclared identifier 'sprintf'; did you mean 'swprintf'?
sprintf(buf, " %03d ", count++);
^~~~~~~
swprintf
/usr/include/wchar.h:133:5: note: 'swprintf' declared here
int swprintf(wchar_t * __restrict, size_t n, const wchar_t * __restrict,
^
regex_impl.cc:1039:17: error: cannot initialize a parameter of type 'wchar_t *' with an lvalue of type 'char [20]'
sprintf(buf, " %03d ", count++);
^~~
/usr/include/wchar.h:133:34: note: passing argument to parameter here
int swprintf(wchar_t * __restrict, size_t n, const wchar_t * __restrict,
^
regex_impl.cc:1197:17: error: use of undeclared identifier 'puts'
{ if (dump) puts(dump_regex(*this).c_str()); }
^
regex_impl.cc:1208:18: note: in instantiation of member function 'Kakoune::(anonymous namespace)::TestVM<Kakoune::RegexMode::Forward>::TestVM' requested here
TestVM<> vm{R"(a*b)"};
^
regex_impl.cc:1197:17: error: use of undeclared identifier 'puts'
{ if (dump) puts(dump_regex(*this).c_str()); }
^
regex_impl.cc:1283:56: note: in instantiation of member function 'Kakoune::(anonymous namespace)::TestVM<5>::TestVM' requested here
TestVM<RegexMode::Forward | RegexMode::Search> vm{R"(f.*a(.*o))"};
^
regex_impl.cc:1197:17: error: use of undeclared identifier 'puts'
{ if (dump) puts(dump_regex(*this).c_str()); }
^
regex_impl.cc:1423:57: note: in instantiation of member function 'Kakoune::(anonymous namespace)::TestVM<6>::TestVM' requested here
TestVM<RegexMode::Backward | RegexMode::Search> vm{R"(fo{1,})"};
^
5 errors generated.
---
remote.cc:829:9: error: use of undeclared identifier 'rename'; did you mean 'devname'?
if (rename(old_socket_file.c_str(), new_socket_file.c_str()) != 0)
^~~~~~
devname
/usr/include/stdlib.h:277:7: note: 'devname' declared here
char *devname(__dev_t, __mode_t);
^
remote.cc:829:16: error: cannot initialize a parameter of type '__dev_t' (aka 'unsigned long') with an rvalue of type 'const char *'
if (rename(old_socket_file.c_str(), new_socket_file.c_str()) != 0)
^~~~~~~~~~~~~~~~~~~~~~~
/usr/include/stdlib.h:277:22: note: passing argument to parameter here
char *devname(__dev_t, __mode_t);
^
2 errors generated.
---
string_utils.cc:126:20: error: use of undeclared identifier 'sprintf'; did you mean 'swprintf'?
res.m_length = sprintf(res.m_data, "%i", val);
^~~~~~~
swprintf
/usr/include/wchar.h:133:5: note: 'swprintf' declared here
int swprintf(wchar_t * __restrict, size_t n, const wchar_t * __restrict,
^
string_utils.cc:126:28: error: cannot initialize a parameter of type 'wchar_t *' with an lvalue of type 'char [15]'
res.m_length = sprintf(res.m_data, "%i", val);
^~~~~~~~~~
/usr/include/wchar.h:133:34: note: passing argument to parameter here
int swprintf(wchar_t * __restrict, size_t n, const wchar_t * __restrict,
^
string_utils.cc:133:20: error: use of undeclared identifier 'sprintf'; did you mean 'swprintf'?
res.m_length = sprintf(res.m_data, "%u", val);
^~~~~~~
swprintf
[...]
The current implementation is wrong as it crosses basic blocks
boundaries. Doing basic block decomposition of regex is probably
a tad too complex for this single optimization.
Fixes#2711
Letting any character to be escaped is error prone as it looks like
\l could mean [:lower:] (as it used to with boost) when it only means
literal l.
Fix the haskell.kak file as well.
Fixes#1945
To allow more general look arounds out of the actual search range,
pass a second range (the actual subject). This allows us to remove
various flags such as PrevAvailable or NotBeginOfSubject, which are
now easy to check from the subject range.
Fixes#1902
forward (which controls if we are compling for forward or backward
matching) is always statically known, and compilation will first
compile forward, then backward (if needed), so by having separate
compiled function we get rid of runtime branches.
No need to have two separate regexes to handle forward and backward
matching, just passing RegexCompileFlags::Backward will add support
for backward matching to the regex. For backward only regex, pass
RegexCompileFlags::NoForward as well to disable generation of
forward matching code.