C++20 added char8_t
and std::u8string
for UTF-8. However, there is no UTF-8 version of std::cout
and OS APIs mostly expect char
UTF-8 "support" in C++20 seems to be a bad joke.
The only UTF functionality in the STL is support for strings and string_views (std::u8string, std::u8string_view, std::u16string, ...). That is all. There is no STL support for UTF coding in regular expressions, formatting, file i/o and so on.
In C++17 you can--at least--easily treat any UTF-8 data as 'char' data, which makes usage of std::regex, std::fstream, std::cout, etc. possible without loss of performance.
In C++20 things will change. You cannot longer write for example std::string text = u8"...";
It will be impossible to write something like
std::u8fstream file; std::u8string line; ... file << line;
since there is no std::u8fstream.
Even the new C++20 std::format does not support UTF at all, because all necessary overloads are simply missing. You cannot write
std::u8string text = std::format(u8"...{}...", 42);
To make matters worse, there is no simple casting (or conversion) between std::string and std::u8string (or even between const char* and const char8_t*). So if you want to format (using std::format) or input/output (std::cin, std::cout, std::fstream, ...) UTF-8 data, you have to internally copy all strings. - That will be an unnecessary performance killer.
Finally, what use will UTF have without input, output, and formatting?