I\'m looking for suggestions regarding unicode aware std::string library replacements. I have a bunch of code that uses std::string, its iterators etc, and would like to now sup
I've written my own C++ UTF-8 library, which is a drop-in replacement of std::wstring
/string
. The data type that is showed to the user is char32_t
, but internally the wide characters are all packed into utf8 char
's.
The whole thing is quite fast and its performance is best with few unicode codepoints within many ascii codepoints. All operations that are known from std::string are available with this class (except for substring find
) and operate on codepoint indices, in contrast to byte indices.
As a bonus of defensive programming, the whole ANSI range of 0-255 can be used without multibytes :)
Hope this helps!