Maxime Coste
dbcddafbfd
Change utf8::to_next/to_previous so that they are more symetrical
...
The previous implementation could yield different positions when
iterating forward and backward, leading to confusion in boost regex.
This makes an existing problem a bit more visible: iterating with
to_next and with read_codepoint wont behave the same way, as
read_codepoint will put the iterator onto the byte following the
utf8 codepoint, whereas to_next will put it on the next utf8
character start byte, which might be different if the buffer content
is not valid utf8.
Fixes #1195
2017-04-20 16:18:49 +01:00
Maxime Coste
249ec4835e
Rename get_width to codepoint_width
2016-10-01 13:45:00 +01:00
Maxime Coste
35559b65dd
Support codepoints of variable width
...
Add a ColumnCount type and use it in place of CharCount whenever
more appropriate, take column size of codepoints into account for
vertical movements and docstring wrapping.
Fixes #811
2016-10-01 13:45:00 +01:00
Maxime Coste
14f59d415d
Avoid underlying iterator copies in utf8_iterator
2016-07-27 21:36:32 +01:00
Maxime Coste
1401c55531
Faster implementation of utf8::advance not copying iterators at each step
2016-07-15 20:26:33 +01:00
Maxime Coste
73fdc726fb
Avoid postfix increment in utf8::distance
2016-07-15 20:07:47 +01:00
Maxime Coste
94cbd5a837
More string usage cleanup
2016-02-05 09:13:07 +00:00
Maxime Coste
4ea89def3b
Avoid (*it++) pattern in utf8.hh
2015-09-25 13:19:21 +01:00
Maxime Coste
aa4b98af7c
Add utf8::read_codepoint that both gets the codepoint and advance iterator
2015-09-24 23:00:47 +01:00
Maxime Coste
e601bd5fe8
Minor additional cleanup in utf8.hh
2015-09-23 22:09:37 +01:00
Maxime Coste
ceafa5459a
Avoid unneeded iterator copies in utf8.hh
2015-09-23 19:48:15 +01:00
Maxime Coste
eb0d03f437
Use Pass as default policy for invalid utf8 avoid asserting on that
2014-10-13 21:07:23 +01:00
Maxime Coste
ed68d1ff28
utf8: use end of sequence iterators for more security
2014-07-05 12:10:06 +01:00
Maxime Coste
3f70d91f8c
Use unsigned char rather than char in utf8 decoding to avoid sign extension
2014-07-05 12:10:06 +01:00
Maxime Coste
db423e4a88
utf8::is_character_start takes directly the char value
2014-05-14 19:49:03 +01:00
Maxime Coste
2d96f853f8
Add utf8::codepoint_size function
2013-05-30 18:49:50 +02:00
Maxime Coste
270e950cf1
sort includes directives
2013-04-09 20:05:40 +02:00
Maxime Coste
5adee4a6a7
rename assert to kak_assert to avoid collisions
2013-04-09 20:04:11 +02:00
Maxime Coste
9f9ad58b39
utf8::dump uses a copy of the output iterator instead of a reference
2013-02-27 23:50:33 +01:00
Maxime Coste
7865223587
Add utf8::character_start function
2013-02-26 14:05:51 +01:00
Maxime Coste
ee882d9d02
utf8: use CharCount instead of size_t
2012-10-27 13:26:40 +02:00
Maxime Coste
df400f90ab
utf8: replace InvalidBytePolicy::Throw with InvalidBytePolicy::Assert
2012-10-17 17:01:51 +02:00
Maxime Coste
dfafcdb6e6
utf8::codepoint: configurable invalid byte policy
2012-10-13 19:05:14 +02:00
Maxime Coste
0ce6bd9bf5
use ByteCount instead of CharCount when we are really counting bytes
...
(that is most of the time when we are not concerned with displaying)
2012-10-11 00:41:48 +02:00
Maxime Coste
571861bc7b
Return something in utf8::distance, thanks again gcc for letting this work
2012-10-11 00:39:17 +02:00
Maxime Coste
ffba94fcde
Actually return something in utf8::codepoint, thanks gcc for using rax
2012-10-10 19:14:18 +02:00
Maxime Coste
7a8366da2b
add a unicode.hh header for Codepoint related functions, s/utf8::Codepoint/Codepoint/
2012-10-09 19:15:05 +02:00
Maxime Coste
1af7465107
utf8: add dump(OutputIterator& it, Codepoint cp)
2012-10-09 14:29:37 +02:00
Maxime Coste
2db1d02329
add utf8 helpers in utf8.hh
2012-10-08 14:25:05 +02:00