Convert between std::u8string and std::string

前端 未结 2 1595
攒了一身酷
攒了一身酷 2021-02-12 10:01

C++20 added char8_t and std::u8string for UTF-8. However, there is no UTF-8 version of std::cout and OS APIs mostly expect char

2条回答
  •  醉梦人生
    2021-02-12 10:58

    UTF-8 "support" in C++20 seems to be a bad joke.

    The only UTF functionality in the STL is support for strings and string_views (std::u8string, std::u8string_view, std::u16string, ...). That is all. There is no STL support for UTF coding in regular expressions, formatting, file i/o and so on.

    In C++17 you can--at least--easily treat any UTF-8 data as 'char' data, which makes usage of std::regex, std::fstream, std::cout, etc. possible without loss of performance.

    In C++20 things will change. You cannot longer write for example std::string text = u8"..."; It will be impossible to write something like

    std::u8fstream file; std::u8string line; ... file << line;
    

    since there is no std::u8fstream.

    Even the new C++20 std::format does not support UTF at all, because all necessary overloads are simply missing. You cannot write

    std::u8string text = std::format(u8"...{}...", 42);
    

    To make matters worse, there is no simple casting (or conversion) between std::string and std::u8string (or even between const char* and const char8_t*). So if you want to format (using std::format) or input/output (std::cin, std::cout, std::fstream, ...) UTF-8 data, you have to internally copy all strings. - That will be an unnecessary performance killer.

    Finally, what use will UTF have without input, output, and formatting?

提交回复
热议问题