How to set file encoding format to UTF8 in C++

后端 未结 4 922
慢半拍i
慢半拍i 2021-01-13 03:59

A requirement for my software is that the encoding of a file which contains exported data shall be UTF8. But when I write the data to the file the encoding is always ANSI. (

相关标签:
4条回答
  • 2021-01-13 04:14

    On Windows in VC++2010 it is possible (not yet implemented in GCC, as far as i know) using localization facet std::codecvt_utf8_utf16 (i.e. in C++11). The sample code from cppreference.com has all basic information you would need to read/write UTF-8 file.

    std::wstring wFromFile = _T("                                                                    
    0 讨论(0)
  • 2021-01-13 04:16

    On Windows, files don't have encodings. Each application will assume an encoding based on its own rules. The best you can do is put a byte-order mark at the front of the file and hope it's recognized.

    0 讨论(0)
  • 2021-01-13 04:18

    AFAIK, fprintf() does character conversions, so there is no guarantee that passing UTF-8 encoded data to it will actually write the UTF-8 to the file. Since you already converted the data yourself, use fwrite() instead so you are writing the UTF-8 data as-is, eg:

    DWORD dwCount = MultiByteToWideChar( CP_ACP, 0, line.c_str(), line.length(), NULL, 0 );  
    if (dwCount == 0) continue;
    
    std::vector<WCHAR> utf16Text(dwCount);  
    MultiByteToWideChar( CP_ACP, 0, line.c_str(), line.length(), &utf16Text[0], dwCount );  
    
    dwCount = WideCharToMultiByte( CP_UTF8, 0, &utf16Text[0], utf16Text.size(), NULL, 0, NULL, NULL );  
    if (dwCount == 0) continue;
    
    std::vector<CHAR> utf8Text(dwCount);  
    WideCharToMultiByte( CP_UTF8, 0, &utf16Text[0], utf16Text.size(), &utf8Text[0], dwCount, NULL, NULL );  
    
    fwrite(&utf8Text[0], sizeof(CHAR), dwCount, pOutputFile);  
    fprintf(pOutputFile, "\n");  
    
    0 讨论(0)
  • 2021-01-13 04:20

    The type char has no clue of any encoding, all it can do is store 8 bits. Therefore any text file is just a sequence of bytes and the user must guess the underlying encoding. A file starting with a BOM indicates UTF 8, but using a BOM is not recommended any more. The type wchar_t in contrast is in Windows always interpreted as UTF 16.

    So let's say you have a file encoded in UTF 8 with just one line: "Confucius says: Smile. 孔子说:微笑!

    0 讨论(0)
提交回复
热议问题