问题
My config:
- Compiler: gnu gcc 4.8.2
- I compile with C++11
- platform/OS: Linux 64bit Ubuntu 14.04.1 LTS
I want to feed a method with wchar_t* and use it in many xecerces library methods that need XMLCh* but I don't know how to translate from one to another. It's easy if you use char* instead of wchar_t* but I need to use wide character. Under windows I could easily cast from one to another but it doesn't work in my linux machine. Somehow I have to manually translate wchar_t* to a XMLCh*
I link throught the library libxerces-c-3.1.so which uses XMLCh* exclusively. XMLCh can deal with wide character, but I don't know how to feed it to it, and also how to get a wchar_t* back from a XMLCh*
I developed this but it doesn't work (here I spit out a wstring which is easier to manage in cleaning up the memory than a pointer:
static inline std::wstring XMLCh2W(const XMLCh* tagname)
{
std::wstring wstr;
XMLSize_t len1 = XMLString::stringLen(tagname);
XMLSize_t outLen = len1 * 4;
XMLByte ut8[outLen+1];
XMLSize_t charsEaten = 0;
XMLTransService::Codes failReason; //Ok | UnsupportedEncoding | InternalFailure | SupportFilesNotFound
XMLTranscoder* transcoder = XMLPlatformUtils::fgTransService->makeNewTranscoderFor("UTF-8", failReason,16*1024);
unsigned int utf8Len = transcoder->transcodeTo(tagname,len1,ut8,outLen,charsEaten,XMLTranscoder::UnRep_Throw);// XMLTranscoder::UnRep_Throw UnRep_RepChar
ut8[utf8Len] = 0;
std::wstring wstr = std::wstring((wchar_t*)ut8);//I'm not sure this is actually ok to do
return wstr;
}
回答1:
No, you can't do that under GCC, because GCC defines wchar_t
as a 32-bit, UTF-32/UCS-4-encoded (the difference is not important for practical purposes) string while Xerces-c defines XmlCh as a 16-bit UTF-16-encoded string.
The best I've found is to use the C++11 support for UTF-16 strings:
char16_t
andXmlCh
are equivalent, though not implicitly convertible; you still need to cast between them. But at least this is cheap, compared to transcoding.std::basic_string<char16_t>
is the equivalent string type.- Use literals of the form
u"str"
andu's'
.
Unfortunately, VC++ doesn't support the C++11 UTF-16 literals, though wchar_t
literals are UTF-16 encoded. So I end up with something like this in a header:
#if defined _MSC_VER
#define U16S(x) L##x
typedef wchar_t my_u16_char_t;
typedef std::wstring my_u16_string_t;
typedef std::wstringstream my_u16_sstream_t;
inline XmlCh* XmlString(my_u16_char_t* s) { return s; }
inline XmlCh* XmlString(my_u16_string_t* s) { return s.c_str(); }
#elif defined __linux
#define U16S(x) u##x
typedef char16_t my_u16_char_t;
typedef std::basic_string<my_u16_char_t> my_u16_string_t;
typedef std::basic_stringstream<my_u16_char_t> my_u16_sstream_t;
inline XmlCh* XmlString(my_u16_char_t* s) { return reinterpret_cast<XmlCh*>(s); }
inline XmlCh* XmlString(my_u16_string_t* s) { return XmlString(s.c_str()); }
#endif
It is, IMO, rather a mess, but not one I can see getting sorted out until VC++ supports C++11 Unicode literals, allowing Xerces to be rewritten in terms of char16_t
directly.
回答2:
XMLCh is defined by wchar_t (on windows) or uint16_t (on Linux) and it is encoded with UTF-16.
Unfortunately, gcc 4.8.2 does not support std::wstring_convert to convert unicode string's encoding. But you can use Boost's locale::conv::utf_to_utf() to convert to/from XMLCh.
#include <boost/locale.hpp>
static inline std::wstring XMLCh2W(const XMLCh* xmlchstr)
{
std::wstring wstr = boost::locale::conv::utf_to_utf<wchar_t>(xmlchstr);
return wstr;
}
static inline std::basic_string<XMLCh> W2XMLCh(const std::wstring& wstr)
{
std::basic_string<XMLCh> xmlstr = boost::locale::conv::utf_to_utf<XMLCh>(wstr);
return xmlstr;
}
If you want to use wchar_t* or XMLCh*, use c_str() method like below.
const wchar_t* wcharPointer = wstr.c_str();
const XMLCh* xmlchPointer = xmlstr.c_str();
回答3:
I recently dealt with this issue, and now that Visual Studio 2015 supports Unicode character and string literals, this is pretty easy to deal with in a cross-platform way. I use the following macro and static_assert
to guarantee correctness:
#define CONST_XMLCH(s) reinterpret_cast<const ::XMLCh*>(u ## s)
static_assert(sizeof(::XMLCh) == sizeof(char16_t),
"XMLCh is not sized correctly for UTF-16.");
Example of usage:
const XMLCh* features = CONST_XMLCH("Core");
auto impl = DOMImplementationRegistry::getDOMImplementation(features);
This works because Xerces defines an XMLCh
to be 16 bits wide and to hold a UTF-16 string value, which perfectly matches up with definition given by the standard for a string literal prefixed by u
. The compiler doesn't know this, and won't implicitly convert between char16_t*
and XMLCh*
, but you can get around this with a reinterpret_cast
. And if for whatever reason you try to compile Xerces on a platform where the sizes don't match up, the static_assert
will fail and draw attention to the problem.
来源:https://stackoverflow.com/questions/25839725/xmlch-to-wchar-t-and-vice-versa