How to convert wstring into string?

送分小仙女□ 提交于 2019-12-16 20:03:41

问题


The question is how to convert wstring to string?

I have next example :

#include <string>
#include <iostream>

int main()
{
    std::wstring ws = L"Hello";
    std::string s( ws.begin(), ws.end() );

  //std::cout <<"std::string =     "<<s<<std::endl;
    std::wcout<<"std::wstring =    "<<ws<<std::endl;
    std::cout <<"std::string =     "<<s<<std::endl;
}

the output with commented out line is :

std::string =     Hello
std::wstring =    Hello
std::string =     Hello

but without is only :

std::wstring =    Hello

Is anything wrong in the example? Can I do the conversion like above?

EDIT

New example (taking into account some answers) is

#include <string>
#include <iostream>
#include <sstream>
#include <locale>

int main()
{
    setlocale(LC_CTYPE, "");

    const std::wstring ws = L"Hello";
    const std::string s( ws.begin(), ws.end() );

    std::cout<<"std::string =     "<<s<<std::endl;
    std::wcout<<"std::wstring =    "<<ws<<std::endl;

    std::stringstream ss;
    ss << ws.c_str();
    std::cout<<"std::stringstream =     "<<ss.str()<<std::endl;
}

The output is :

std::string =     Hello
std::wstring =    Hello
std::stringstream =     0x860283c

therefore the stringstream can not be used to convert wstring into string.


回答1:


Here is a worked-out solution based on the other suggestions:

#include <string>
#include <iostream>
#include <clocale>
#include <locale>
#include <vector>

int main() {
  std::setlocale(LC_ALL, "");
  const std::wstring ws = L"ħëłlö";
  const std::locale locale("");
  typedef std::codecvt<wchar_t, char, std::mbstate_t> converter_type;
  const converter_type& converter = std::use_facet<converter_type>(locale);
  std::vector<char> to(ws.length() * converter.max_length());
  std::mbstate_t state;
  const wchar_t* from_next;
  char* to_next;
  const converter_type::result result = converter.out(state, ws.data(), ws.data() + ws.length(), from_next, &to[0], &to[0] + to.size(), to_next);
  if (result == converter_type::ok or result == converter_type::noconv) {
    const std::string s(&to[0], to_next);
    std::cout <<"std::string =     "<<s<<std::endl;
  }
}

This will usually work for Linux, but will create problems on Windows.




回答2:


As Cubbi pointed out in one of the comments, std::wstring_convert (C++11) provides a neat simple solution (you need to #include <locale> and <codecvt>):

std::wstring string_to_convert;

//setup converter
using convert_type = std::codecvt_utf8<wchar_t>;
std::wstring_convert<convert_type, wchar_t> converter;

//use converter (.to_bytes: wstr->str, .from_bytes: str->wstr)
std::string converted_str = converter.to_bytes( string_to_convert );

I was using a combination of wcstombs and tedious allocation/deallocation of memory before I came across this.

http://en.cppreference.com/w/cpp/locale/wstring_convert

update(2013.11.28)

One liners can be stated as so (Thank you Guss for your comment):

std::wstring str = std::wstring_convert<std::codecvt_utf8<wchar_t>>().from_bytes("some string");

Wrapper functions can be stated as so: (Thank you ArmanSchwarz for your comment)

std::wstring s2ws(const std::string& str)
{
    using convert_typeX = std::codecvt_utf8<wchar_t>;
    std::wstring_convert<convert_typeX, wchar_t> converterX;

    return converterX.from_bytes(str);
}

std::string ws2s(const std::wstring& wstr)
{
    using convert_typeX = std::codecvt_utf8<wchar_t>;
    std::wstring_convert<convert_typeX, wchar_t> converterX;

    return converterX.to_bytes(wstr);
}

Note: there's some controversy on whether string/wstring should be passed in to functions as references or as literals (due to C++11 and compiler updates). I'll leave the decision to the person implementing, but it's worth knowing.

Note: I'm using std::codecvt_utf8 in the above code, but if you're not using UTF-8 you'll need to change that to the appropriate encoding you're using:

http://en.cppreference.com/w/cpp/header/codecvt




回答3:


Solution from: http://forums.devshed.com/c-programming-42/wstring-to-string-444006.html

std::wstring wide( L"Wide" ); 
std::string str( wide.begin(), wide.end() );

// Will print no problemo!
std::cout << str << std::endl;

Beware that there is no character set conversion going on here at all. What this does is simply to assign each iterated wchar_t to a char - a truncating conversion. It uses the std::string c'tor:

template< class InputIt >
basic_string( InputIt first, InputIt last,
              const Allocator& alloc = Allocator() );

As stated in comments:

values 0-127 are identical in virtually every encoding, so truncating values that are all less than 127 results in the same text. Put in a chinese character and you'll see the failure.

-

the values 128-255 of windows codepage 1252 (the Windows English default) and the values 128-255 of unicode are mostly the same, so if that's teh codepage you're using most of those characters should be truncated to the correct values. (I totally expected á and õ to work, I know our code at work relies on this for é, which I will soon fix)

And note that code points in the range 0x80 - 0x9F in Win1252 will not work. This includes , œ, ž, Ÿ, ...




回答4:


Instead of including locale and all that fancy stuff, if you know for FACT your string is convertible just do this:

#include <iostream>
#include <string>

using namespace std;

int main()
{
  wstring w(L"bla");
  string result;
  for(char x : w)
    result += x;

  cout << result << '\n';
}

Live example here




回答5:


I believe the official way is still to go thorugh codecvt facets (you need some sort of locale-aware translation), as in

resultCode = use_facet<codecvt<char, wchar_t, ConversionState> >(locale).
  in(stateVar, scratchbuffer, scratchbufferEnd, from, to, toLimit, curPtr);

or something like that, I don't have working code lying around. But I'm not sure how many people these days use that machinery and how many simply ask for pointers to memory and let ICU or some other library handle the gory details.




回答6:


There are two issues with the code:

  1. The conversion in const std::string s( ws.begin(), ws.end() ); is not required to correctly map the wide characters to their narrow counterpart. Most likely, each wide character will just be typecast to char.
    The resolution to this problem is already given in the answer by kem and involves the narrow function of the locale's ctype facet.

  2. You are writing output to both std::cout and std::wcout in the same program. Both cout and wcout are associated with the same stream (stdout) and the results of using the same stream both as a byte-oriented stream (as cout does) and a wide-oriented stream (as wcout does) are not defined.
    The best option is to avoid mixing narrow and wide output to the same (underlying) stream. For stdout/cout/wcout, you can try switching the orientation of stdout when switching between wide and narrow output (or vice versa):

    #include <iostream>
    #include <stdio.h>
    #include <wchar.h>
    
    int main() {
        std::cout << "narrow" << std::endl;
        fwide(stdout, 1); // switch to wide
        std::wcout << L"wide" << std::endl;
        fwide(stdout, -1); // switch to narrow
        std::cout << "narrow" << std::endl;
        fwide(stdout, 1); // switch to wide
        std::wcout << L"wide" << std::endl;
    }
    



回答7:


You might as well just use the ctype facet's narrow method directly:

#include <clocale>
#include <locale>
#include <string>
#include <vector>

inline std::string narrow(std::wstring const& text)
{
    std::locale const loc("");
    wchar_t const* from = text.c_str();
    std::size_t const len = text.size();
    std::vector<char> buffer(len + 1);
    std::use_facet<std::ctype<wchar_t> >(loc).narrow(from, from + len, '_', &buffer[0]);
    return std::string(&buffer[0], &buffer[len]);
}



回答8:


At the time of writing this answer, the number one google search for "convert string wstring" would land you on this page. My answer shows how to convert string to wstring, although this is NOT the actual question, and I should probably delete this answer but that is considered bad form. You may want to jump to this StackOverflow answer, which is now higher ranked than this page.


Here's a way to combining string, wstring and mixed string constants to wstring. Use the wstringstream class.

#include <sstream>

std::string narrow = "narrow";
std::wstring wide = "wide";

std::wstringstream cls;
cls << " abc " << narrow.c_str() << L" def " << wide.c_str();
std::wstring total= cls.str();



回答9:


In my case, I have to use multibyte character (MBCS), and I want to use std::string and std::wstring. And can't use c++11. So I use mbstowcs and wcstombs.

I make same function with using new, delete [], but it is slower then this.

This can help How to: Convert Between Various String Types

EDIT

However, in case of converting to wstring and source string is no alphabet and multi byte string, it's not working. So I change wcstombs to WideCharToMultiByte.

#include <string>

std::wstring get_wstr_from_sz(const char* psz)
{
    //I think it's enough to my case
    wchar_t buf[0x400];
    wchar_t *pbuf = buf;
    size_t len = strlen(psz) + 1;

    if (len >= sizeof(buf) / sizeof(wchar_t))
    {
        pbuf = L"error";
    }
    else
    {
        size_t converted;
        mbstowcs_s(&converted, buf, psz, _TRUNCATE);
    }

    return std::wstring(pbuf);
}

std::string get_string_from_wsz(const wchar_t* pwsz)
{
    char buf[0x400];
    char *pbuf = buf;
    size_t len = wcslen(pwsz)*2 + 1;

    if (len >= sizeof(buf))
    {
        pbuf = "error";
    }
    else
    {
        size_t converted;
        wcstombs_s(&converted, buf, pwsz, _TRUNCATE);
    }

    return std::string(pbuf);
}

EDIT to use 'MultiByteToWideChar' instead of 'wcstombs'

#include <Windows.h>
#include <boost/shared_ptr.hpp>
#include "string_util.h"

std::wstring get_wstring_from_sz(const char* psz)
{
    int res;
    wchar_t buf[0x400];
    wchar_t *pbuf = buf;
    boost::shared_ptr<wchar_t[]> shared_pbuf;

    res = MultiByteToWideChar(CP_ACP, 0, psz, -1, buf, sizeof(buf)/sizeof(wchar_t));

    if (0 == res && GetLastError() == ERROR_INSUFFICIENT_BUFFER)
    {
        res = MultiByteToWideChar(CP_ACP, 0, psz, -1, NULL, 0);

        shared_pbuf = boost::shared_ptr<wchar_t[]>(new wchar_t[res]);

        pbuf = shared_pbuf.get();

        res = MultiByteToWideChar(CP_ACP, 0, psz, -1, pbuf, res);
    }
    else if (0 == res)
    {
        pbuf = L"error";
    }

    return std::wstring(pbuf);
}

std::string get_string_from_wcs(const wchar_t* pcs)
{
    int res;
    char buf[0x400];
    char* pbuf = buf;
    boost::shared_ptr<char[]> shared_pbuf;

    res = WideCharToMultiByte(CP_ACP, 0, pcs, -1, buf, sizeof(buf), NULL, NULL);

    if (0 == res && GetLastError() == ERROR_INSUFFICIENT_BUFFER)
    {
        res = WideCharToMultiByte(CP_ACP, 0, pcs, -1, NULL, 0, NULL, NULL);

        shared_pbuf = boost::shared_ptr<char[]>(new char[res]);

        pbuf = shared_pbuf.get();

        res = WideCharToMultiByte(CP_ACP, 0, pcs, -1, pbuf, res, NULL, NULL);
    }
    else if (0 == res)
    {
        pbuf = "error";
    }

    return std::string(pbuf);
}



回答10:


Default encoding on:

  • Windows UTF-16.
  • Linux UTF-8.
  • MacOS UTF-8.

This code have two forms to convert std::string to std::wstring and std::wstring to std::string. If you negate #if defined WIN32, you get the same result.

1. std::string to std::wstring

• MultiByteToWideChar WinAPI

• _mbstowcs_s_l

#if defined WIN32
#include <windows.h>
#endif

std::wstring StringToWideString(std::string str)
{
    if (str.empty())
    {
        return std::wstring();
    }
    size_t len = str.length() + 1;
    std::wstring ret = std::wstring(len, 0);
#if defined WIN32
    int size = MultiByteToWideChar(CP_UTF8, MB_ERR_INVALID_CHARS, &str[0], str.size(), &ret[0], len);
    ret.resize(size);
#else
    size_t size = 0;
    _locale_t lc = _create_locale(LC_ALL, "en_US.UTF-8");
    errno_t retval = _mbstowcs_s_l(&size, &ret[0], len, &str[0], _TRUNCATE, lc);
    _free_locale(lc);
    ret.resize(size - 1);
#endif
    return ret;
}

2. std::wstring to std::string

• WideCharToMultiByte WinAPI

• _wcstombs_s_l

std::string WidestringToString(std::wstring wstr)
{
    if (wstr.empty())
    {
        return std::string();
    }
#if defined WIN32
    int size = WideCharToMultiByte(CP_UTF8, WC_ERR_INVALID_CHARS, &wstr[0], wstr.size(), NULL, 0, NULL, NULL);
    std::string ret = std::string(size, 0);
    WideCharToMultiByte(CP_UTF8, WC_ERR_INVALID_CHARS, &wstr[0], wstr.size(), &ret[0], size, NULL, NULL);
#else
    size_t size = 0;
    _locale_t lc = _create_locale(LC_ALL, "en_US.UTF-8");
    errno_t err = _wcstombs_s_l(&size, NULL, 0, &wstr[0], _TRUNCATE, lc);
    std::string ret = std::string(size, 0);
    err = _wcstombs_s_l(&size, &ret[0], size, &wstr[0], _TRUNCATE, lc);
    _free_locale(lc);
    ret.resize(size - 1);
#endif
    return ret;
}

3. On windows you need to print unicode, using WinAPI.

• WriteConsole

#if defined _WIN32
    void WriteLineUnicode(std::string s)
    {
        std::wstring unicode = StringToWideString(s);
        WriteConsole(GetStdHandle(STD_OUTPUT_HANDLE), unicode.c_str(), unicode.length(), NULL, NULL);
        std::cout << std::endl;
    }

    void WriteUnicode(std::string s)
    {
        std::wstring unicode = StringToWideString(s);
        WriteConsole(GetStdHandle(STD_OUTPUT_HANDLE), unicode.c_str(), unicode.length(), NULL, NULL);
    }

    void WriteLineUnicode(std::wstring ws)
    {
        WriteConsole(GetStdHandle(STD_OUTPUT_HANDLE), ws.c_str(), ws.length(), NULL, NULL);
        std::cout << std::endl;
    }

    void WriteUnicode(std::wstring ws)
    {
        WriteConsole(GetStdHandle(STD_OUTPUT_HANDLE), ws.c_str(), ws.length(), NULL, NULL);
    }

4. On main program.

#if defined _WIN32
int wmain(int argc, WCHAR ** args)
#else
int main(int argc, CHAR ** args)
#endif
{
    std::string source = u8"ÜüΩωЙ你月曜日\na🐕èéøÞǽлљΣæča🐕🐕";
    std::wstring wsource = L"ÜüΩωЙ你月曜日\na🐕èéøÞǽлљΣæča🐕🐕";

    WriteLineUnicode(L"@" + StringToWideString(source) + L"@");
    WriteLineUnicode("@" + WidestringToString(wsource) + "@");
    return EXIT_SUCCESS;
}

5. Finally You need a powerfull and complete support for unicode chars in console. I recommend ConEmu and set as default terminal on Windows. You need to hook Visual Studio to ConEmu. Remember that Visual Studio's exe file is devenv.exe

Tested on Visual Studio 2017 with VC++; std=c++17.

Result




回答11:


This solution is inspired in dk123's solution, but uses a locale dependent codecvt facet. The result is in locale encoded string instead of UTF-8 (if it is not set as locale):

std::string w2s(const std::wstring &var)
{
   static std::locale loc("");
   auto &facet = std::use_facet<std::codecvt<wchar_t, char, std::mbstate_t>>(loc);
   return std::wstring_convert<std::remove_reference<decltype(facet)>::type, wchar_t>(&facet).to_bytes(var);
}

std::wstring s2w(const std::string &var)
{
   static std::locale loc("");
   auto &facet = std::use_facet<std::codecvt<wchar_t, char, std::mbstate_t>>(loc);
   return std::wstring_convert<std::remove_reference<decltype(facet)>::type, wchar_t>(&facet).from_bytes(var);
}

I was searching for it, but I can't find it. Finally I found that I can get the right facet from std::locale using the std::use_facet() function with the right typename. Hope this helps.




回答12:


In case anyone else is interested: I needed a class that could be used interchangeably wherever either a string or wstring was expected. The following class convertible_string, based on dk123's solution, can be initialized with either a string, char const*, wstring or wchar_t const* and can be assigned to by or implicitly converted to either a string or wstring (so can be passed into a functions that take either).

class convertible_string
{
public:
    // default ctor
    convertible_string()
    {}

    /* conversion ctors */
    convertible_string(std::string const& value) : value_(value)
    {}
    convertible_string(char const* val_array) : value_(val_array)
    {}
    convertible_string(std::wstring const& wvalue) : value_(ws2s(wvalue))
    {}
    convertible_string(wchar_t const* wval_array) : value_(ws2s(std::wstring(wval_array)))
    {}

    /* assignment operators */
    convertible_string& operator=(std::string const& value)
    {
        value_ = value;
        return *this;
    }
    convertible_string& operator=(std::wstring const& wvalue)
    {
        value_ = ws2s(wvalue);
        return *this;
    }

    /* implicit conversion operators */
    operator std::string() const { return value_; }
    operator std::wstring() const { return s2ws(value_); }
private:
    std::string value_;
};



回答13:


#include <boost/locale.hpp>
namespace lcv = boost::locale::conv;

inline std::wstring fromUTF8(const std::string& s)
{ return lcv::utf_to_utf<wchar_t>(s); }

inline std::string toUTF8(const std::wstring& ws)
{ return lcv::utf_to_utf<char>(ws); }



回答14:


I am using below to convert wstring to string.

std::string strTo;
char *szTo = new char[someParam.length() + 1];
szTo[someParam.size()] = '\0';
WideCharToMultiByte(CP_ACP, 0, someParam.c_str(), -1, szTo, (int)someParam.length(), NULL, NULL);
strTo = szTo;
delete szTo;



回答15:


// Embarcadero C++ Builder 

// convertion string to wstring
string str1 = "hello";
String str2 = str1;         // typedef UnicodeString String;   -> str2 contains now u"hello";

// convertion wstring to string
String str2 = u"hello";
string str1 = UTF8string(str2).c_str();   // -> str1 contains now "hello"


来源:https://stackoverflow.com/questions/4804298/how-to-convert-wstring-into-string

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!