Convert UTF-16 to UTF-8 under Windows and Linux, in C

前端 未结 8 704
不思量自难忘°
不思量自难忘° 2020-12-01 02:50

I was wondering if there is a recommended \'cross\' Windows and Linux method for the purpose of converting strings from UTF-16LE to UTF-8? or one should use different method

相关标签:
8条回答
  • 2020-12-01 03:16

    I have run into this problem too, I solve it by using boost locale library

    try
    {           
        std::string utf8 = boost::locale::conv::utf_to_utf<char, short>(
                            (short*)wcontent.c_str(), 
                            (short*)(wcontent.c_str() + wcontent.length()));
        content = boost::locale::conv::from_utf(utf8, "ISO-8859-1");
    }
    catch (boost::locale::conv::conversion_error e)
    {
        std::cout << "Fail to convert from UTF-8 to " << toEncoding << "!" << std::endl;
        break;
    }
    

    The boost::locale::conv::utf_to_utf function try to convert from a buffer that encoded by UTF-16LE to UTF-8, The boost::locale::conv::from_utf function try to convert from a buffer that encoded by UTF-8 to ANSI, make sure the encoding is right(Here I use encoding for Latin-1, ISO-8859-1).

    Another reminder is, in Linux std::wstring is 4 bytes long, but in Windows std::wstring is 2 bytes long, so you would better not use std::wstring to contain UTF-16LE buffer.

    0 讨论(0)
  • 2020-12-01 03:16

    There's also utfcpp, which is a header-only library.

    0 讨论(0)
  • 2020-12-01 03:21

    Thanks guys, this is how I managed to solve the 'cross' windows and linux requirement:

    1. Downloaded and installed: MinGW , and MSYS
    2. Downloaded the libiconv source package
    3. Compiled libiconv via MSYS.

    That's about it.

    0 讨论(0)
  • 2020-12-01 03:24

    If you don't want to use ICU,

    1. Windows: WideCharToMultiByte
    2. Linux: iconv (Glibc)
    0 讨论(0)
  • 2020-12-01 03:32
    #include <iconv.h>
    
    wchar_t *src = ...; // or char16_t* on non-Windows platforms
    int srclen = ...;
    char *dst = ...;
    int dstlen = ...;
    iconv_t conv = iconv_open("UTF-8", "UTF-16");
    iconv(conv, (char*)&src, &srclen, &dst, &dstlen);
    iconv_close(conv);
    
    0 讨论(0)
  • 2020-12-01 03:33

    The open source ICU library is very commonly used.

    0 讨论(0)
提交回复
热议问题