Read/Write file with unicode file name with plain C++/Boost

前端 未结 4 1738
Happy的楠姐
Happy的楠姐 2021-02-20 12:56

I want to read / write a file with a unicode file name using boost filesystem, boost locale on Windows (mingw) (should be platform independent at the end).

This is my co

4条回答
  •  清酒与你
    2021-02-20 13:33

    EDIT : add references to boost and wchar_t at end of post and another possible solution on Windows

    I could reproduce nearly same thing on ubuntu and on windows without even using boost (I don't have it on my windows box). To fix it, I just had to convert the source in the same encoding as the system, ie utf8 on Ubuntu and latin1 or iso-8859-1 on Windows.

    As I suspected, the problem comes from the line fs::path file("äöü.txt");. As the encoding of the file is not what is expected it is more or less read as fs::path file("äöü.txt");. It you control, you will find that the size is 10. That fully explains that the output file has a wrong name.

    I suspect that the test if (!fs::exists(file)) correctly works because either boost or windows automatically fixes the encoding on input.

    So on Windows, simply use an editor in code page 1252 or latin1 or iso-8859-1, and you should not have problems, provided you do not have to use characters outside of this charset. If you need characters outside of Latin1 I am afraid that you will have to use the unicode API of Windows.

    EDIT:

    In fact, Windows (> NT) works natively with wchar_t and not char. And not surprisingly, boost on windows does the same - see boost library filesystemreference. Extract :

    For Windows-like implementations, including MinGW, path::value_type is wchar_t. The default imbued locale provides a codecvt facet that invokes Windows MultiByteToWideChar or WideCharToMultiByte API with a codepage of CP_THREAD_ACP if Windows AreFileApisANSI()is true ...

    So, another solution in Windows that would allow full unicode character set (or at least the subset natively offered by Windows) would be to give the file path as as wstring and not as as string. Alternatively if you really want to use UTF8 encoded filenames you will have to force the thread locale to use UTF8 and not CP1252. I cannot give code example of that because I don't have boost on my windows box, my windows box runs old XP and does not support UTF8 and I don't want to post untested code, but I think that in that case, you should replace

    std::locale::global(boost::locale::generator().generate(""));
    

    with something like :

    std::locale::global(boost::locale::generator().generate("UTF8"));
    

    BEWARE : untested so I'm not sure if the string for generate is UTF8 or something else ...

提交回复
热议问题