UTF8 to/from wide char conversion in STL

后端 未结 10 745
面向向阳花
面向向阳花 2020-11-22 06:48

Is it possible to convert UTF8 string in a std::string to std::wstring and vice versa in a platform independent manner? In a Windows application I would use MultiByteToWideC

10条回答
  •  -上瘾入骨i
    2020-11-22 07:05

    Created my own library for utf-8 to utf-16/utf-32 conversion - but decided to make a fork of existing project for that purpose.

    https://github.com/tapika/cutf

    (Originated from https://github.com/noct/cutf )

    API works with plain C as well as with C++.

    Function prototypes looks like this: (For full list see https://github.com/tapika/cutf/blob/master/cutf.h )

    //
    //  Converts utf-8 string to wide version.
    //
    //  returns target string length.
    //
    size_t utf8towchar(const char* s, size_t inSize, wchar_t* out, size_t bufSize);
    
    //
    //  Converts wide string to utf-8 string.
    //
    //  returns filled buffer length (not string length)
    //
    size_t wchartoutf8(const wchar_t* s, size_t inSize, char* out, size_t outsize);
    
    #ifdef __cplusplus
    
    std::wstring utf8towide(const char* s);
    std::wstring utf8towide(const std::string& s);
    std::string  widetoutf8(const wchar_t* ws);
    std::string  widetoutf8(const std::wstring& ws);
    
    #endif
    

    Sample usage / simple test application for utf conversion testing:

    #include "cutf.h"
    
    #define ok(statement)                                       \
        if( !(statement) )                                      \
        {                                                       \
            printf("Failed statement: %s\n", #statement);       \
            r = 1;                                              \
        }
    
    int simpleStringTest()
    {
        const wchar_t* chineseText = L"主体";
        auto s = widetoutf8(chineseText);
        size_t r = 0;
    
        printf("simple string test:  ");
    
        ok( s.length() == 6 );
        uint8_t utf8_array[] = { 0xE4, 0xB8, 0xBB, 0xE4, 0xBD, 0x93 };
    
        for(int i = 0; i < 6; i++)
            ok(((uint8_t)s[i]) == utf8_array[i]);
    
        auto ws = utf8towide(s);
        ok(ws.length() == 2);
        ok(ws == chineseText);
    
        if( r == 0 )
            printf("ok.\n");
    
        return (int)r;
    }
    

    And if this library does not satisfy your needs - feel free to open following link:

    http://utf8everywhere.org/

    and scroll down at the end of page and pick up any heavier library which you like.

提交回复
热议问题