Is it possible to convert UTF8 string in a std::string to std::wstring and vice versa in a platform independent manner? In a Windows application I would use MultiByteToWideC
Created my own library for utf-8 to utf-16/utf-32 conversion - but decided to make a fork of existing project for that purpose.
https://github.com/tapika/cutf
(Originated from https://github.com/noct/cutf )
API works with plain C as well as with C++.
Function prototypes looks like this: (For full list see https://github.com/tapika/cutf/blob/master/cutf.h )
//
// Converts utf-8 string to wide version.
//
// returns target string length.
//
size_t utf8towchar(const char* s, size_t inSize, wchar_t* out, size_t bufSize);
//
// Converts wide string to utf-8 string.
//
// returns filled buffer length (not string length)
//
size_t wchartoutf8(const wchar_t* s, size_t inSize, char* out, size_t outsize);
#ifdef __cplusplus
std::wstring utf8towide(const char* s);
std::wstring utf8towide(const std::string& s);
std::string widetoutf8(const wchar_t* ws);
std::string widetoutf8(const std::wstring& ws);
#endif
Sample usage / simple test application for utf conversion testing:
#include "cutf.h"
#define ok(statement) \
if( !(statement) ) \
{ \
printf("Failed statement: %s\n", #statement); \
r = 1; \
}
int simpleStringTest()
{
const wchar_t* chineseText = L"主体";
auto s = widetoutf8(chineseText);
size_t r = 0;
printf("simple string test: ");
ok( s.length() == 6 );
uint8_t utf8_array[] = { 0xE4, 0xB8, 0xBB, 0xE4, 0xBD, 0x93 };
for(int i = 0; i < 6; i++)
ok(((uint8_t)s[i]) == utf8_array[i]);
auto ws = utf8towide(s);
ok(ws.length() == 2);
ok(ws == chineseText);
if( r == 0 )
printf("ok.\n");
return (int)r;
}
And if this library does not satisfy your needs - feel free to open following link:
http://utf8everywhere.org/
and scroll down at the end of page and pick up any heavier library which you like.