I need to convert between wstring and string. I figured out, that using codecvt facet should do the trick, but it doesn\'t seem to work for utf-8 locale.
My idea is,
What locale does is that it gives the program information about the external encoding, but assuming that the internal encoding didn't change. If you want to output UTF-8 you need to do it from wchar_t
not from char*
.
What you could do is output it as raw data (not string), it should be then correctly interpreted if the systems locale is UTF-8.
Plus when using (w)cout
/(w)cerr
/(w)cin
you need to imbue the locale on the stream.
The Lexertl library has an iterator that lets you do this:
std::string str;
str.assign(
lexertl::basic_utf8_out_iterator<std::wstring::const_iterator>(wstr.begin()),
lexertl::basic_utf8_out_iterator<std::wstring::const_iterator>(wstr.end()));
You can use boost's utf_to_utf converter to get char format to store in std::string.
std::string myresult = boost::locale::conv::utf_to_utf<char>(my_wstring);
The code below might help you :)
#include <codecvt>
#include <string>
// convert UTF-8 string to wstring
std::wstring utf8_to_wstring (const std::string& str)
{
std::wstring_convert<std::codecvt_utf8<wchar_t>> myconv;
return myconv.from_bytes(str);
}
// convert wstring to UTF-8 string
std::string wstring_to_utf8 (const std::wstring& str)
{
std::wstring_convert<std::codecvt_utf8<wchar_t>> myconv;
return myconv.to_bytes(str);
}
What's your platform? Note that Windows does not support UTF-8 locales so this may explain why you're failing.
To get this done in a platform dependent way you can use MultiByteToWideChar/WideCharToMultiByte on Windows and iconv on Linux. You may be able to use some boost magic to get this done in a platform independent way, but I haven't tried it myself so I can't add about this option.
C++ has no idea of Unicode. Use an external library such as ICU (UnicodeString class) or Qt (QString class), both support Unicode, including UTF-8.