C++ unicode file io

前端 未结 5 1479
广开言路
广开言路 2021-02-06 16:10

I need a file io library that can give my program a utf-16 (little endian) interface, but can handle files in other encodings, mainly ascii(input only), utf-8, utf-16, utf-32/uc

5条回答
  •  一向
    一向 (楼主)
    2021-02-06 16:46

    The problem you see comes from the linefeed conversion. Sadly, it is made at the byte level (after the code conversion) and is not aware of the encoding. IOWs, you have to disable the automatic conversion (by opening the file in binary mode, with the "b" flag) and, if you want 0A00 to be expanded to 0D00A00, you'll have to do it yourself.

    You mention that you'd prefer a C++ wide-stream interface, so I'll outline what I did to achieve that in our software:

    • Write a std::codecvt facet using an ICU UConverter to perform the conversions.
    • Use an std::wfstream to open the file
    • imbue() your custom codecvt in the wfstream
    • Open the wfstream with the binary flag, to turn off the automatic (and erroneous) linefeed conversion.
    • Write a "WNewlineFilter" to perform linefeed conversion on wchars. Use inspiration from boost::iostreams::newline_filter
    • Use a boost::iostreams::filtering_wstream to tie the wfstream and the WNewlineFilter together as a stream.

提交回复
热议问题