STL and UTF-8 file input/output. How to do it?

前端 未结 5 405
醉酒成梦
醉酒成梦 2020-12-06 08:15

I use wchar_t for internal strings and UTF-8 for storage in files. I need to use STL to input/output text to screen and also do it by using full Lithua

相关标签:
5条回答
  • 2020-12-06 08:44

    get FILE* or integer file handle form a std::basic_*fstream?

    Answered elsewhere.

    0 讨论(0)
  • 2020-12-06 08:47

    You can't make STL to directly work with UTF-8. The basic reason is that STL indirectly forbids multi-char characters. Each character has to be one char/wchar_t.

    Microsoft actually breaks the standard with their UTF-16 encoding, so maybe you can get some inspiration there.

    0 讨论(0)
  • 2020-12-06 08:49

    Use std::codecvt_facet template to perform the conversion.

    You may use standard std::codecvt_byname, or a non-standard codecvt_facet implementation.

    #include <locale>
    using namespace std;
    typedef codecvt_facet<wchar_t, char, mbstate_t> Cvt;
    locale utf8locale(locale(), new codecvt_byname<wchar_t, char, mbstate_t> ("en_US.UTF-8"));
    wcout.pubimbue(utf8locale);
    wcout << L"Hello, wide to multybyte world!" << endl;
    

    Beware that on some platforms codecvt_byname can only emit conversion only for locales that are installed in the system.

    0 讨论(0)
  • 2020-12-06 08:59

    The easiest way would be to do the conversion to UTF-8 yourself before trying to output. You might get some inspiration from this question: UTF8 to/from wide char conversion in STL

    0 讨论(0)
  • 2020-12-06 09:00

    Well, after some testing I figured out that FILE is accepted for _iobuf (in the w*fstream constructor). So, the following code does what I need.

    #include <iostream>
    #include <fstream>
    #include <io.h>
    #include <fcntl.h>
    //For writing
        FILE* fp;
        _wfopen_s (&fp, L"utf-8_out_test.txt", L"w");
        _setmode (_fileno (fp), _O_U8TEXT);
        wofstream fs (fp);
        fs << L"ąfl";
        fclose (fp);
    //And reading
        FILE* fp;
        _wfopen_s (&fp, L"utf-8_in_test.txt", L"r");
        _setmode (_fileno (fp), _O_U8TEXT);
        wifstream fs (fp);
        wchar_t array[6];
        fs.getline (array, 5);
        wcout << array << endl;//For debug
        fclose (fp);
    This sample reads and writes legit UTF-8 files (without BOM) in Windows compiled with Visual Studio 2k8.

    Can someone give any comments about portability? Improvements?

    0 讨论(0)
提交回复
热议问题