问题
The following code works as expected. The source code, file "file.txt" and "out.txt" are all encoded with utf8. But it does not work when I change wchar_t
to char16_t
at the first line in main()
. I've tried both gcc5.4 and clang8.0 with -std=c++11
. My goal is to replace wchar_t
with char16_t
, as wchar_t
takes twice space in RAM. I thought these 2 types are equally well supported in c++11 and later standards. What do I miss here?
#include<iostream>
#include<fstream>
#include<locale>
#include<codecvt>
#include<string>
int main(){
typedef wchar_t my_char;
std::locale::global(std::locale("en_US.UTF-8"));
std::ofstream out("file.txt");
out << "123正则表达式abc" << std::endl;
out.close();
std::basic_ifstream<my_char> win("file.txt");
std::basic_string<my_char> wstr;
win >> wstr;
win.close();
std::ifstream in("file.txt");
std::string str;
in >> str;
in.close();
std::wstring_convert<std::codecvt_utf8<my_char>, my_char> my_char_conv;
std::basic_string<my_char> conv = my_char_conv.from_bytes(str);
std::cout << (wstr == conv ? "true" : "false") << std::endl;
std::basic_ofstream<my_char> wout("out.txt");
wout << wstr << std::endl << conv << std::endl;
wout.close();
return 0;
}
EDIT
The modified code does not compile with clang8.0. It compiles with gcc5.4 but crashes at run-time as shown by @Brian.
回答1:
The various stream classes need a set of definitions to be operational. The standard library requires the relevant definitions and objects only for char
and wchar_t
but not for char16_t
or char32_t
. Off the top of my head the following is needed to use std::basic_ifstream<cT>
or std::basic_ofstream<cT>
:
std::char_traits<cT>
to specify how the character type behaves. I think this template is specialized forchar16_t
andchar32_t
.- The used
std::locale
needs to contain an instance of thestd::num_put<cT>
facet to format numeric types. This facet can just be instantiated and a newstd::locale
containing it can be created but the standard doesn't mandate that it is present in astd::locale
object. - The used
std::locale
needs to contain an instance of the facetstd::num_get<cT>
to read numeric types. Again, this facet can be instantiated but isn't required to be present by default. - the facet
std::numpunct<cT>
needs to be specialized and put into the usedstd::locale
to deal with decimal points, thousand separators, and textual boolean values. Even if it isn't really used it will be referenced from the numeric formatting and parsing functions. There is no ready specialization forchar16_t
orchar32_t
. - The facet
std::ctype<cT>
needs to be specialized and put into the used facet to support widening, narrowing, and classification of the character type. There is no ready specialization forchar16_t
orchar32_t
.- The facet
std::codecvt<cT, char, std::mbstate_t>
needs to be specialized and put into the usedstd::locale
to convert between external byte sequences and internal "character" sequences. There is no ready specialization forchar16_t
orchar32_t
.
- The facet
Most of the facets are reasonably easy to do: they just need to forward a simple conversion or do table look-ups. However, the std::codecvt
facet tends to be rather tricky, especially because std::mbstate_t
is an opaque type from the point of view of the standard C++ library.
All of that can be done. It is a while since I last did a proof of concept implementation for a character type. It took me about a day worth of work. Of course, I knew what I need to do when I embarked on the work having implemented the locales and IOStreams library before. To add a reasonable amount of tests rather than merely having a simple demo would probably take me a week or so (assuming I can actually concentrate on this work).
来源:https://stackoverflow.com/questions/41315675/why-does-stdbasic-ifstreamchar16-t-not-work-in-c11