Commit Graph

34 Commits

Author SHA1 Message Date
Jean-Louis Fuchs
9d897a6092 Rank a word-boundary after a non-word-boundary 2019-09-07 22:53:29 +02:00
Maxime Coste
445da8d7bf Use an InvalidPolicy in utf8::dump and utf8::codepoint_size
Do not throw on invalid codepoints by default, ignore them.
Fixes #2686
2019-01-13 18:29:20 +11:00
Maxime Coste
4cfb46ff2e Support different type for iterators and sentinel in utf8 functions 2018-11-01 08:22:43 +11:00
Maxime Coste
78d7d512cb Fix utf8::to_previous that could go before the begin iterator 2017-10-10 10:53:24 +08:00
Maxime Coste
e264d189eb Add noexcept specifiers to unicode and utf8 functions 2017-04-23 12:47:26 +01:00
Maxime Coste
dbcddafbfd Change utf8::to_next/to_previous so that they are more symetrical
The previous implementation could yield different positions when
iterating forward and backward, leading to confusion in boost regex.

This makes an existing problem a bit more visible: iterating with
to_next and with read_codepoint wont behave the same way, as
read_codepoint will put the iterator onto the byte following the
utf8 codepoint, whereas to_next will put it on the next utf8
character start byte, which might be different if the buffer content
is not valid utf8.

Fixes #1195
2017-04-20 16:18:49 +01:00
Maxime Coste
249ec4835e Rename get_width to codepoint_width 2016-10-01 13:45:00 +01:00
Maxime Coste
35559b65dd Support codepoints of variable width
Add a ColumnCount type and use it in place of CharCount whenever
more appropriate, take column size of codepoints into account for
vertical movements and docstring wrapping.

Fixes #811
2016-10-01 13:45:00 +01:00
Maxime Coste
14f59d415d Avoid underlying iterator copies in utf8_iterator 2016-07-27 21:36:32 +01:00
Maxime Coste
1401c55531 Faster implementation of utf8::advance not copying iterators at each step 2016-07-15 20:26:33 +01:00
Maxime Coste
73fdc726fb Avoid postfix increment in utf8::distance 2016-07-15 20:07:47 +01:00
Maxime Coste
94cbd5a837 More string usage cleanup 2016-02-05 09:13:07 +00:00
Maxime Coste
4ea89def3b Avoid (*it++) pattern in utf8.hh 2015-09-25 13:19:21 +01:00
Maxime Coste
aa4b98af7c Add utf8::read_codepoint that both gets the codepoint and advance iterator 2015-09-24 23:00:47 +01:00
Maxime Coste
e601bd5fe8 Minor additional cleanup in utf8.hh 2015-09-23 22:09:37 +01:00
Maxime Coste
ceafa5459a Avoid unneeded iterator copies in utf8.hh 2015-09-23 19:48:15 +01:00
Maxime Coste
eb0d03f437 Use Pass as default policy for invalid utf8 avoid asserting on that 2014-10-13 21:07:23 +01:00
Maxime Coste
ed68d1ff28 utf8: use end of sequence iterators for more security 2014-07-05 12:10:06 +01:00
Maxime Coste
3f70d91f8c Use unsigned char rather than char in utf8 decoding to avoid sign extension 2014-07-05 12:10:06 +01:00
Maxime Coste
db423e4a88 utf8::is_character_start takes directly the char value 2014-05-14 19:49:03 +01:00
Maxime Coste
2d96f853f8 Add utf8::codepoint_size function 2013-05-30 18:49:50 +02:00
Maxime Coste
270e950cf1 sort includes directives 2013-04-09 20:05:40 +02:00
Maxime Coste
5adee4a6a7 rename assert to kak_assert to avoid collisions 2013-04-09 20:04:11 +02:00
Maxime Coste
9f9ad58b39 utf8::dump uses a copy of the output iterator instead of a reference 2013-02-27 23:50:33 +01:00
Maxime Coste
7865223587 Add utf8::character_start function 2013-02-26 14:05:51 +01:00
Maxime Coste
ee882d9d02 utf8: use CharCount instead of size_t 2012-10-27 13:26:40 +02:00
Maxime Coste
df400f90ab utf8: replace InvalidBytePolicy::Throw with InvalidBytePolicy::Assert 2012-10-17 17:01:51 +02:00
Maxime Coste
dfafcdb6e6 utf8::codepoint: configurable invalid byte policy 2012-10-13 19:05:14 +02:00
Maxime Coste
0ce6bd9bf5 use ByteCount instead of CharCount when we are really counting bytes
(that is most of the time when we are not concerned with displaying)
2012-10-11 00:41:48 +02:00
Maxime Coste
571861bc7b Return something in utf8::distance, thanks again gcc for letting this work 2012-10-11 00:39:17 +02:00
Maxime Coste
ffba94fcde Actually return something in utf8::codepoint, thanks gcc for using rax 2012-10-10 19:14:18 +02:00
Maxime Coste
7a8366da2b add a unicode.hh header for Codepoint related functions, s/utf8::Codepoint/Codepoint/ 2012-10-09 19:15:05 +02:00
Maxime Coste
1af7465107 utf8: add dump(OutputIterator& it, Codepoint cp) 2012-10-09 14:29:37 +02:00
Maxime Coste
2db1d02329 add utf8 helpers in utf8.hh 2012-10-08 14:25:05 +02:00