Running %sYeti<ret>casdf on file
[example.journal.txt](https://github.com/mawww/kakoune/issues/4685#issuecomment-1193243588)
can cause noticeable lag. This is because we insert text at 6000
selections, which means we need to update highlighters in those lines.
The runtime for updating range highlighters is quadratic in the
number of selections: for each selection, we call on_new_range(),
which calls add_matches(), which calls std::rotate(), which needs
needs linear time.
Fix the quadratic runtime by calling std::inplace_merge() once instead
of repeatedly calling std::rotate(). This is works because ranges
are already sorted.
I used this script to benchmark the improvements.
(In hindsight I could have just used "-ui json" instead of tmux).
#!/bin/sh
set -ex
N=${1:-100}
kak=${2:-./kak.opt}
for i in $(seq "$N")
do
echo -n "\
2022-02-06 * Earth
expense:electronics:audio 116.7 USD
liability:card -116.7 USD
2022-02-06 * Blue Yeti USB Microphone
expense:electronics:audio 116.7 USD
liability:card -116.7 USD
"
done > big-journal.ledger
echo > .empty-tmux.conf 'set -sg escape-time 5'
test_tmux() {
tmux -S .tmux-socket -f .empty-tmux.conf "$@"
}
test_tmux new-session -d "$kak" big-journal.ledger
test_tmux send-keys '%sYeti' Enter c 1234567890
sleep .2
test_tmux send-keys Escape
while ! test_tmux capture-pane -p | grep 123
do
sleep .1
done
test_tmux send-keys ':wq' Enter
while test_tmux ls
do
sleep .1
done
rm -f .tmux-socket .empty-tmux.conf
This script's runtime used to grow super-linearly but now it grows
linearly:
kak.old kak.new
N=10000 1.142 0.897
N=20000 2.879 1.400
Detailed results:
$ hyperfine -w 1 './bench.sh 10000 ./kak.opt.'{old,new}
Benchmark 1: ./bench.sh 10000 ./kak.opt.old
Time (mean ± σ): 1.142 s ± 0.072 s [User: 0.252 s, System: 0.059 s]
Range (min … max): 1.060 s … 1.242 s 10 runs
Benchmark 2: ./bench.sh 10000 ./kak.opt.new
Time (mean ± σ): 897.2 ms ± 19.3 ms [User: 241.6 ms, System: 57.4 ms]
Range (min … max): 853.9 ms … 923.6 ms 10 runs
Summary
'./bench.sh 10000 ./kak.opt.new' ran
1.27 ± 0.09 times faster than './bench.sh 10000 ./kak.opt.old'
$ hyperfine -w 1 './bench.sh 20000 ./kak.opt.'{old,new}
Benchmark 1: ./bench.sh 20000 ./kak.opt.old
Time (mean ± σ): 2.879 s ± 0.065 s [User: 0.553 s, System: 0.126 s]
Range (min … max): 2.768 s … 2.963 s 10 runs
Benchmark 2: ./bench.sh 20000 ./kak.opt.new
Time (mean ± σ): 1.400 s ± 0.018 s [User: 0.428 s, System: 0.083 s]
Range (min … max): 1.374 s … 1.429 s 10 runs
Summary
'./bench.sh 20000 ./kak.opt.new' ran
2.06 ± 0.05 times faster than '../repro.sh 20000 ./kak.opt.old'
Instead of storing regexes in each regions, move them to the core
highlighter in a hash map so that shared regexes between different
regions are only applied once per update instead of once per region
Also change iteration logic to apply all regex together to each
changed lines to improve memory locality on big buffers.
For the big_markdown.md file described in #4685 this reduces
initial display time from 3.55s to 2.41s on my machine.
This means that typing
:add-highlighter g c 80
results in
:add-highlighter global/ column 80
Paths for add-highlighter do not get the menu behavior because we
want to be able to type "global/foo" even if "global/foobar" exists.
Instead of triming only buffer ranges, add a trim_from method to
display line to keep the initial N columns, we know how many columns
are used by non-trimable widgets in DisplaySetup::widget_columns so
we can just pass this.
Also restore the previous logic for face merging
Fixes#4670
Make the column highlighter faces final, and change final logic to
give precedence to the base face when both the base and new face are
final.
Fixes#4669
Always start with full buffer lines and trim the display buffer at
the very end, treat non-range display atoms as non-trimable in that
case and keep track of how many columns are occupied by "widgets"
such as line numbers or flags.
Fixes#4659
- Add the Narrow No-Break SPace (0x202F, NNBSP) to the list of handled
spaces in the show-whitespace highlighter.
- Do not add an aditional option, just handle it like NBSP, with the same highlight character.
Because no flags were set for regex matching, the regex engine was
assuming that the subject string past-the-end matched a end-of-line.
As the subject string already ended with a \n character, the regex
engine processing of the "past-the-end" position would match '^$'
as ^ matched past the existing \n and $ matched the assumed
end-of-line.
Fixes#3799
Ranges specified with a +<length> were inconsistent, with +0 meaning
an empty range, while +1 meant a two character long range (first character
+ the following one). Change that to mean a single character.
Fixes#3479
When a region calls the regex highlighter, it is incorrect to share
the regex cache as it means we can get matches that span multiple
regions.
Fixes#3041
First try to break at a whitespace, if that fails (likely because
that last WORD is too long for the wrapping width), then try to
wrap at a 'word' boundary (on a non alphanumeric character).
Fixes#3048
file.cc:390:21: error: use of undeclared identifier 'rename'; did you mean 'devname'?
if (replace and rename(temp_filename, zfilename) != 0)
^~~~~~
devname
/usr/include/stdlib.h:277:7: note: 'devname' declared here
char *devname(__dev_t, __mode_t);
^
file.cc:390:28: error: cannot initialize a parameter of type '__dev_t' (aka 'unsigned long') with an lvalue of type 'char [1024]'
if (replace and rename(temp_filename, zfilename) != 0)
^~~~~~~~~~~~~
/usr/include/stdlib.h:277:22: note: passing argument to parameter here
char *devname(__dev_t, __mode_t);
^
2 errors generated.
---
highlighters.cc:1110:13: error: use of undeclared identifier 'snprintf'; did you mean 'vswprintf'?
snprintf(buffer, 16, format, std::abs(line_to_format));
^~~~~~~~
vswprintf
/usr/include/wchar.h:139:5: note: 'vswprintf' declared here
int vswprintf(wchar_t * __restrict, size_t n, const wchar_t * __restrict,
^
highlighters.cc:1110:22: error: cannot initialize a parameter of type 'wchar_t *' with an lvalue of type 'char [16]'
snprintf(buffer, 16, format, std::abs(line_to_format));
^~~~~~
/usr/include/wchar.h:139:35: note: passing argument to parameter here
int vswprintf(wchar_t * __restrict, size_t n, const wchar_t * __restrict,
^
2 errors generated.
---
json_ui.cc:60:13: error: use of undeclared identifier 'sprintf'; did you mean 'swprintf'?
sprintf(buf, "\\u%04x", *next);
^~~~~~~
swprintf
/usr/include/wchar.h:133:5: note: 'swprintf' declared here
int swprintf(wchar_t * __restrict, size_t n, const wchar_t * __restrict,
^
json_ui.cc:60:21: error: cannot initialize a parameter of type 'wchar_t *' with an lvalue of type 'char [7]'
sprintf(buf, "\\u%04x", *next);
^~~
/usr/include/wchar.h:133:34: note: passing argument to parameter here
int swprintf(wchar_t * __restrict, size_t n, const wchar_t * __restrict,
^
json_ui.cc:74:9: error: use of undeclared identifier 'sprintf'
sprintf(buffer, R"("#%02x%02x%02x")", color.r, color.g, color.b);
^
3 errors generated.
---
regex_impl.cc:1039:9: error: use of undeclared identifier 'sprintf'; did you mean 'swprintf'?
sprintf(buf, " %03d ", count++);
^~~~~~~
swprintf
/usr/include/wchar.h:133:5: note: 'swprintf' declared here
int swprintf(wchar_t * __restrict, size_t n, const wchar_t * __restrict,
^
regex_impl.cc:1039:17: error: cannot initialize a parameter of type 'wchar_t *' with an lvalue of type 'char [20]'
sprintf(buf, " %03d ", count++);
^~~
/usr/include/wchar.h:133:34: note: passing argument to parameter here
int swprintf(wchar_t * __restrict, size_t n, const wchar_t * __restrict,
^
regex_impl.cc:1197:17: error: use of undeclared identifier 'puts'
{ if (dump) puts(dump_regex(*this).c_str()); }
^
regex_impl.cc:1208:18: note: in instantiation of member function 'Kakoune::(anonymous namespace)::TestVM<Kakoune::RegexMode::Forward>::TestVM' requested here
TestVM<> vm{R"(a*b)"};
^
regex_impl.cc:1197:17: error: use of undeclared identifier 'puts'
{ if (dump) puts(dump_regex(*this).c_str()); }
^
regex_impl.cc:1283:56: note: in instantiation of member function 'Kakoune::(anonymous namespace)::TestVM<5>::TestVM' requested here
TestVM<RegexMode::Forward | RegexMode::Search> vm{R"(f.*a(.*o))"};
^
regex_impl.cc:1197:17: error: use of undeclared identifier 'puts'
{ if (dump) puts(dump_regex(*this).c_str()); }
^
regex_impl.cc:1423:57: note: in instantiation of member function 'Kakoune::(anonymous namespace)::TestVM<6>::TestVM' requested here
TestVM<RegexMode::Backward | RegexMode::Search> vm{R"(fo{1,})"};
^
5 errors generated.
---
remote.cc:829:9: error: use of undeclared identifier 'rename'; did you mean 'devname'?
if (rename(old_socket_file.c_str(), new_socket_file.c_str()) != 0)
^~~~~~
devname
/usr/include/stdlib.h:277:7: note: 'devname' declared here
char *devname(__dev_t, __mode_t);
^
remote.cc:829:16: error: cannot initialize a parameter of type '__dev_t' (aka 'unsigned long') with an rvalue of type 'const char *'
if (rename(old_socket_file.c_str(), new_socket_file.c_str()) != 0)
^~~~~~~~~~~~~~~~~~~~~~~
/usr/include/stdlib.h:277:22: note: passing argument to parameter here
char *devname(__dev_t, __mode_t);
^
2 errors generated.
---
string_utils.cc:126:20: error: use of undeclared identifier 'sprintf'; did you mean 'swprintf'?
res.m_length = sprintf(res.m_data, "%i", val);
^~~~~~~
swprintf
/usr/include/wchar.h:133:5: note: 'swprintf' declared here
int swprintf(wchar_t * __restrict, size_t n, const wchar_t * __restrict,
^
string_utils.cc:126:28: error: cannot initialize a parameter of type 'wchar_t *' with an lvalue of type 'char [15]'
res.m_length = sprintf(res.m_data, "%i", val);
^~~~~~~~~~
/usr/include/wchar.h:133:34: note: passing argument to parameter here
int swprintf(wchar_t * __restrict, size_t n, const wchar_t * __restrict,
^
string_utils.cc:133:20: error: use of undeclared identifier 'sprintf'; did you mean 'swprintf'?
res.m_length = sprintf(res.m_data, "%u", val);
^~~~~~~
swprintf
[...]