Read Unicode UTF-8 file into wstring

前端 未结 6 1746
无人及你
无人及你 2020-11-30 00:26

How can I read a Unicode (UTF-8) file into wstring(s) on the Windows platform?

相关标签:
6条回答
  • 2020-11-30 00:58

    According to a comment by @Hans Passant, the simplest way is to use _wfopen_s. Open the file with mode rt, ccs=UTF-8.

    Here is another pure C++ solution that works at least with VC++ 2010:

    #include <locale>
    #include <codecvt>
    #include <string>
    #include <fstream>
    #include <cstdlib>
    
    int main() {
        const std::locale empty_locale = std::locale::empty();
        typedef std::codecvt_utf8<wchar_t> converter_type;
        const converter_type* converter = new converter_type;
        const std::locale utf8_locale = std::locale(empty_locale, converter);
        std::wifstream stream(L"test.txt");
        stream.imbue(utf8_locale);
        std::wstring line;
        std::getline(stream, line);
        std::system("pause");
    }
    

    Except for locale::empty() (here locale::global() might work as well) and the wchar_t* overload of the basic_ifstream constructor, this should even be pretty standard-compliant (where “standard” means C++0x, of course).

    0 讨论(0)
  • 2020-11-30 01:05

    With C++11 support, you can use std::codecvt_utf8 facet which encapsulates conversion between a UTF-8 encoded byte string and UCS2 or UCS4 character string and which can be used to read and write UTF-8 files, both text and binary.

    In order to use facet you usually create locale object that encapsulates culture-specific information as a set of facets that collectively define a specific localized environment. Once you have a locale object, you can imbue your stream buffer with it:

    #include <sstream>
    #include <fstream>
    #include <codecvt>
    
    std::wstring readFile(const char* filename)
    {
        std::wifstream wif(filename);
        wif.imbue(std::locale(std::locale::empty(), new std::codecvt_utf8<wchar_t>));
        std::wstringstream wss;
        wss << wif.rdbuf();
        return wss.str();
    }
    

    which can be used like this:

    std::wstring wstr = readFile("a.txt");
    

    Alternatively you can set the global C++ locale before you work with string streams which causes all future calls to the std::locale default constructor to return a copy of the global C++ locale (you don't need to explicitly imbue stream buffers with it then):

    std::locale::global(std::locale(std::locale::empty(), new std::codecvt_utf8<wchar_t>));
    
    0 讨论(0)
  • 2020-11-30 01:10

    This question was addressed in Confused about C++'s std::wstring, UTF-16, UTF-8 and displaying strings in a windows GUI. In sum, wstring is based upon the UCS-2 standard, which is the predecessor of UTF-16. This is a strictly two byte standard. I believe this covers Arabic.

    0 讨论(0)
  • 2020-11-30 01:15

    This is a bit raw, but how about reading the file as plain old bytes then cast the byte buffer to wchar_t* ?

    Something like:

    #include <iostream>
    #include <fstream>
    std::wstring ReadFileIntoWstring(const std::wstring& filepath)
    {
        std::wstring wstr;
        std::ifstream file (filepath.c_str(), std::ios::in|std::ios::binary|std::ios::ate);
        size_t size = (size_t)file.tellg();
        file.seekg (0, std::ios::beg);
        char* buffer = new char [size];
        file.read (buffer, size);
        wstr = (wchar_t*)buffer;
        file.close();
        delete[] buffer;
        return wstr;
    }
    
    0 讨论(0)
  • 2020-11-30 01:16

    Here's a platform-specific function for Windows only:

    size_t GetSizeOfFile(const std::wstring& path)
    {
        struct _stat fileinfo;
        _wstat(path.c_str(), &fileinfo);
        return fileinfo.st_size;
    }
    
    std::wstring LoadUtf8FileToString(const std::wstring& filename)
    {
        std::wstring buffer;            // stores file contents
        FILE* f = _wfopen(filename.c_str(), L"rtS, ccs=UTF-8");
    
        // Failed to open file
        if (f == NULL)
        {
            // ...handle some error...
            return buffer;
        }
    
        size_t filesize = GetSizeOfFile(filename);
    
        // Read entire file contents in to memory
        if (filesize > 0)
        {
            buffer.resize(filesize);
            size_t wchars_read = fread(&(buffer.front()), sizeof(wchar_t), filesize, f);
            buffer.resize(wchars_read);
            buffer.shrink_to_fit();
        }
    
        fclose(f);
    
        return buffer;
    }
    

    Use like so:

    std::wstring mytext = LoadUtf8FileToString(L"C:\\MyUtf8File.txt");
    

    Note the entire file is loaded in to memory, so you might not want to use it for very large files.

    0 讨论(0)
  • 2020-11-30 01:17
    #include <iostream>
    #include <fstream>
    #include <string>
    #include <locale>
    #include <cstdlib>
    
    int main()
    {
        std::wifstream wif("filename.txt");
        wif.imbue(std::locale("zh_CN.UTF-8"));
    
        std::wcout.imbue(std::locale("zh_CN.UTF-8"));
        std::wcout << wif.rdbuf();
    }
    
    0 讨论(0)
提交回复
热议问题