I want to iterate each character of a Unicode string, treating each surrogate pair and combining character sequence as a single unit (one grapheme).
<ICU has a very old interface, Boost.Locale is much better:
#include <iostream>
#include <string_view>
#include <boost/locale.hpp>
using namespace std::string_view_literals;
int main()
{
boost::locale::generator gen;
auto string = "noël
Glib's ustring class gives you utf-8 strings, if using utf-8 is ok for you. It is designed to be similar to std::string
. Since utf-8 is native for Linux, your task is quite easy:
int main()
{
Glib::ustring s = L"नमस्ते";
cout << s.size();
}
you can also iterate on string's characters as usual with Glib::ustring::iterator
You should be able to use the ICU BreakIterator for this (the character instance assuming it is feature-equivalent to the Java version).