Kakoune's balanced strings require that delimiter characters nested inside
them are also paired, so for example in %{ }, each nested { must occur
before a corresponding } to balance it out.
In general this will automatically be the case for code in common scripting
languages, but sometimes regular expressions used for syntax highlighting
do end up containing an unbalanced bracket of one type or another.
This problem is easily solved because there is a free choice of balanced
delimiter characters. However, it can also be worked around by adding
a comment which itself contains an unbalanced delimiter character, to
'balance out' the unpaired one in the regular expression.
These unbalanced comments are not ideal as the semantic role they perform
is easy for a casual reader to overlook. A good example is
catch %{
# indent after lines with an unclosed { or (
try %< execute-keys -draft [c[({],[)}] <ret> <a-k> \A[({][^\n]*\n[^\n]*\n?\z <ret> j<a- gt> >
# indent after a switch's case/default statements
try %[ execute-keys -draft kx <a-k> ^\h*(case|default).*:$ <ret> j<a-gt> ]
# deindent closing brace(s) when after cursor
try %[ execute-keys -draft x <a-k> ^\h*[})] <ret> gh / [})] <ret> m <a-S> 1<a-&> ]
}
in rc/filetype/go/kak. Here, it is not instantly obvious that the comment
containing an unmatched { is required for correctness. If you change the
comment, delete it or rearrange the contents of the catch block, go.kak
will fail to load, and if you cut-and-paste this code as the basis for
a new filetype, it is a loaded gun pointing at your feet.
Luckily, a careful audit of the standard kakoune library turned up only
three such instances, in go.kak, hare.kak and markdown.kak.
The examples in go.kak and hare.kak are easily made robust by replacing
a %{ } with %< > or %[ ] respectively. The example in markdown.kak is
least-intrusively fixed by rewriting the affected regular expression
slightly so it has balanced { and } anyway.
I can't think of a case where a URL would not start at a word boundary.
Let's add that to the regex. In addition to correctness, this also
slightly improves performance because matching can stop earlier.
$ HOME=$PWD hyperfine -w 1 'git checkout HEAD'{~,}' -- :/rc/filetype/markdown.kak && ./kak.opt big_markdown.md -e "hook global NormalIdle .* quit" -ui dummy'
Benchmark 1: git checkout HEAD~ -- :/rc/filetype/markdown.kak && ./kak.opt big_markdown.md -e "hook global NormalIdle .* quit" -ui dummy
Time (mean ± σ): 1.123 s ± 0.022 s [User: 1.100 s, System: 0.027 s]
Range (min … max): 1.093 s … 1.174 s 10 runs
Benchmark 2: git checkout HEAD -- :/rc/filetype/markdown.kak && ./kak.opt big_markdown.md -e "hook global NormalIdle .* quit" -ui dummy
Time (mean ± σ): 1.019 s ± 0.026 s [User: 1.001 s, System: 0.021 s]
Range (min … max): 0.984 s … 1.051 s 10 runs
Summary
'git checkout HEAD -- :/rc/filetype/markdown.kak && ./kak.opt big_markdown.md -e "hook global NormalIdle .* quit" -ui dummy' ran
1.10 ± 0.04 times faster than 'git checkout HEAD~ -- :/rc/filetype/markdown.kak && ./kak.opt big_markdown.md -e "hook global NormalIdle .* quit" -ui dummy'
There have been proposals to add more language aliases to markdown.kak
(#4592) and allow users to add their own aliases (#4489).
To recap: various markdown implementations allow specifying aliases
for languages. For example, here is a code block that should be
highlighted as filetype "haskell" but isn't:
```hs
-- highlight as haskell
```
There are lots of aliases out in the wild - "pygmentize -L" lists
some but I don't think there is a canonical list.
Today we have a hardcoded list of supported filetypes. This is hard
to mainta, extend, and it can impact performance.
This patch simply attempts to load the module "hs" and the shared
highlighter "hs". This means that users can use this (obvious?) snippet
to add their own aliases:
provide-module hs %{
require-module haskell
add-highlighter shared/hs ref haskell
}
Untrusted Markdown files can load arbitrary modules, but that was
already true before, and modules are assumed to be trusted anyway.
Since language highlighters are now loaded *after* the generic
code-block highlighter, we need to make sure the language highlighters
take precedence. Do this by making them sub-regions of the generic one.
Closes#4489
This improves performance on the [5MB Markdown
file](https://github.com/mawww/kakoune/issues/4685#issuecomment-1208129806).
$ HOME=$PWD hyperfine -w 1 'git checkout HEAD'{~,}' -- :/rc/filetype/markdown.kak && ./kak.opt big_markdown.md -e "hook global NormalIdle .* quit" -ui dummy'
Benchmark 1: git checkout HEAD~ -- :/rc/filetype/markdown.kak && ./kak.opt big_markdown.md -e "hook global NormalIdle .* quit" -ui dummy
Time (mean ± σ): 3.225 s ± 0.074 s [User: 3.199 s, System: 0.027 s]
Range (min … max): 3.099 s … 3.362 s 10 runs
Benchmark 2: git checkout HEAD -- :/rc/filetype/markdown.kak && ./kak.opt big_markdown.md -e "hook global NormalIdle .* quit" -ui dummy
Time (mean ± σ): 1.181 s ± 0.030 s [User: 1.162 s, System: 0.021 s]
Range (min … max): 1.149 s … 1.234 s 10 runs
Summary
'git checkout HEAD -- :/rc/filetype/markdown.kak && ./kak.opt big_markdown.md -e "hook global NormalIdle .* quit" -ui dummy' ran
2.73 ± 0.09 times faster than 'git checkout HEAD~ -- :/rc/filetype/markdown.kak && ./kak.opt big_markdown.md -e "hook global NormalIdle .* quit" -ui dummy'
(These numbers depend on another optimization.)
Filetypes markdown and restructuredtext reuse highlighters from other
filetypes to highlight code blocks. For example, to highlight a code
block of language foo they essentially do
require-module foo
add-highlighter [...] ref foo
This works great if the module name matches the shared
highlighter. This is the case almost all scripts in rc/filetype*.
The only exception is kakrc.kak: the highlighter is named "kakrc"
(just like the filetype) but the module is named "kak".
This requires weird hacks in markdown/restructuredtext. Ideally we
could remove this inconsistency by renaming both the filetype and the
highlighter to "kak" but that's a breaking change. Until we do that,
let's add an alias so we can treat filetypes uniformly. This helps
the following commits, which otherwise would need to add ugly extra
code for kakrc highlighters.
The following commit will generalize this approach, allowing users
to add arbitrary aliases.
After opening a markdown file
```b
```
```c
int main() {}
```
markdown-load-languages will run an "evaluate-commands -itersel".
The first selection makes us run "require-module b", which fails
because that module can't be found. Since -itersel only ignores the
"no selection remaining" error we fail to run "require-module c". Fix
this by ignoring errors.
`x` is often criticized as hard to predict due to its slightly complex
behaviour of selecting next line if the current one is fully selected.
Change `x` to use the previous `<a-x>` behaviour, and change `<a-x>` to
trim to fully selected lines as `<a-X>` did.
Adapt existing indentation script to the new behaviour
An indent hook automatically adds whitespace, so it seems prudent to
add the hook to remove unwanted whitespace again. This is what we do
in most languages already.
The closing ``` in the following example was not detected because the
indented code block highlighter was higher up in the hierarchy than the
fenced code block highlighter:
```
indented
```
The codeblock highlighter used to be inline so that it has an effect
inside listblocks. This commits adds a listblock/codeblock highlighter
as a replacement.
Fixes#4351
The invalid regex `)\b` currently matches anything, so this didn't cause
any errors.
It is still invalid though, so I fixed it by moving the `\b` to the end
of the non-raw_attribute language name (like the original regex). The
raw_attribute one shouldn't need this because the `}` marks the end of
the language name anyway.
Fixes#4025
Because the HTML highlighter was higher up in the hierarchy than the code
highlighter, it took precedence. I fixed it by making it an inline region.
Using my new knowledge of "inline" I was able to remove one line of code.
Fixes#4091
This highlighter (line 50 of markdown.kak) looks for the filetype
specified by the author at the top of the code fence, e.g.
``` python
print("hello")
```
and highlights the code within using Kakoune's relevant highlighter --
in this case Python.
Some flavours of markdown use curly braces and other characters in the
first line such as the following:
``` {=python}
print("hello")
```
Previously Kakoune recognised `{=python}` but not `{.python}`. The latter
is Pandoc's flavour of markdown. This patch adjusts the regex patterns
to recognise the dot notation as well.
This commit prevents the lines following the one that holds the bullet
from being highlighted with the `bullet` face when they're indented:
- The bullet is highlighted properly, so is this sentence
but this line the ones that would follow are not
Fixes#3582
This commit allows code blocks to be prefixed with tabulation
characters to be picked up and highlighted by the editor.
Indenting caused by the inclusion of an inline code block into a
list item is also taken into account. However, that might cause false
positives, for example with a hard wrapped list item indented with
an amount of spaces congruent to 4.
This commit addresses the following issues:
* highlight trailing space characters with the `meta` face, instead of
`PrimarySelection`
* make the regex more readable by using a capture group in stead of
`\K`
* specifically match space characters, not other horizontal whitespace
characters
* match two or more space characters
Reference[1]:
> When you do want to insert a <br /> break tag using Markdown, you
> end a line with two or more spaces, then type return.
[1] https://daringfireball.net/projects/markdown/syntax#p
Note that the original reproducer doesn't seem to work anymore,
probably because of changes made to how lists are highlighted.
Fixes#911
This commit removes declarations and mentions to the built-in `bold`
and `italic` faces.
While they could be a user-friendly way of customising how tokens
are emphasised in Markdown documents (similarly to the
`$LESS_TERMCAP_*` environment variables for `man` pagers), most other
markup languages do not have the concept of "strong" and "emphasis"
but refer directly to the font style/weight.
The faces were also not even set by default to highlight as their
names implied, so having markup language support scripts directly
use the +b and +i face attributes is more consistent.
This commit allows using the <semicolon> expansion in commands, instead
of `\;`.
It makes commands look more elegant, and prevents new-comers from
falling into the trap of using <a-;> without escaping the semicolon.