A requirement for my software is that the encoding of a file which contains exported data shall be UTF8. But when I write the data to the file the encoding is always ANSI. (
On Windows in VC++2010 it is possible (not yet implemented in GCC, as far as i know) using localization facet std::codecvt_utf8_utf16 (i.e. in C++11). The sample code from cppreference.com has all basic information you would need to read/write UTF-8 file.
std::wstring wFromFile = _T("
On Windows, files don't have encodings. Each application will assume an encoding based on its own rules. The best you can do is put a byte-order mark at the front of the file and hope it's recognized.
AFAIK, fprintf()
does character conversions, so there is no guarantee that passing UTF-8 encoded data to it will actually write the UTF-8 to the file. Since you already converted the data yourself, use fwrite()
instead so you are writing the UTF-8 data as-is, eg:
DWORD dwCount = MultiByteToWideChar( CP_ACP, 0, line.c_str(), line.length(), NULL, 0 );
if (dwCount == 0) continue;
std::vector<WCHAR> utf16Text(dwCount);
MultiByteToWideChar( CP_ACP, 0, line.c_str(), line.length(), &utf16Text[0], dwCount );
dwCount = WideCharToMultiByte( CP_UTF8, 0, &utf16Text[0], utf16Text.size(), NULL, 0, NULL, NULL );
if (dwCount == 0) continue;
std::vector<CHAR> utf8Text(dwCount);
WideCharToMultiByte( CP_UTF8, 0, &utf16Text[0], utf16Text.size(), &utf8Text[0], dwCount, NULL, NULL );
fwrite(&utf8Text[0], sizeof(CHAR), dwCount, pOutputFile);
fprintf(pOutputFile, "\n");
The type char
has no clue of any encoding, all it can do is store 8 bits. Therefore any text file is just a sequence of bytes and the user must guess the underlying encoding. A file starting with a BOM indicates UTF 8, but using a BOM is not recommended any more. The type wchar_t
in contrast is in Windows always interpreted as UTF 16.
So let's say you have a file encoded in UTF 8 with just one line: "Confucius says: Smile. 孔子说:微笑!