Xerces-c and cross-platform string literals

好久不见. 提交于 2019-12-11 07:48:23

问题


I'm porting a code-base that uses Xerces-c for XML processing from Windows/VC++ to Linux/G++.

On Windows, Xerces-c uses wchar_t as the character type XmlCh. This has allowed people to use std::wstring and string literals of L"" syntax.

On Linux/G++, wchar_t is 32-bit and Xerces-c uses unsigned short int (16-bit) as the character type XmlCh.

I've started out along this track:

#ifdef _MSC_VER
using u16char_t = wchar_t;
using u16string_t = std::wstring;
#elif defined __linux
using u16char_t = char16_t;
using u16string_t = std::u16string;
#endif

Unfortunately, char16_t and unsigned short int are not equivalent and their pointers are not implicitly convertible. So passing u"Hello, world." to Xerces functions still results in invalid conversion errors.

It's starting to look like I'm going to have to explicitly cast every string I pass to Xerces functions. But before I do, I wanted to ask if anyone knows a saner way to programme cross-platform Xerces-c code.


回答1:


The answer is that no, no-one has a good idea on how to do this. For anyone else who finds this question, this is what I came up with:

#ifdef _MSC_VER
#define U16S(x) L##x
#define U16XS(x) L##x

#define XS(x) x
#define US(x) x

#elif defined __linux

#define U16S(x) u##x
#define U16XS(x) reinterpret_cast<const unsigned short *>(u##x)

inline unsigned short *XS(char16_t* x) {
    return reinterpret_cast<unsigned short *>(x);
}
inline const unsigned short *XS(const char16_t* x) {
    return reinterpret_cast<const unsigned short *>(x);
}
inline char16_t* US(unsigned short *x) {
    return reinterpret_cast<char16_t *>(x);
}
inline const char16_t* US(const unsigned short *x) {
    return reinterpret_cast<const char16_t*>(x);
}

#include "char16_t_facets.hpp"
#endif

namespace SafeStrings {
#if defined _MSC_VER

    using u16char_t = wchar_t;
    using u16string_t = std::wstring;
    using u16sstream_t = std::wstringstream;
    using u16ostream_t = std::wostream;
    using u16istream_t = std::wistream;
    using u16ofstream_t = std::wofstream;
    using u16ifstream_t = std::wifstream;
    using filename_t = std::wstring;

#elif defined __linux

    using u16char_t = char16_t;
    using u16string_t = std::basic_string<char16_t>;
    using u16sstream_t = std::basic_stringstream<char16_t>;
    using u16ostream_t = std::basic_ostream<char16_t>;
    using u16istream_t = std::basic_istream<char16_t>;
    using u16ofstream_t = std::basic_ofstream<char16_t>;
    using u16ifstream_t = std::basic_ifstream<char16_t>;
    using filename_t = std::string;

#endif

char16_t_facets.hpp has definitions of the template specialisations std::ctype<char16_t>, std::numpunct<char16_t>, std::codecvt<char16_t, char, std::mbstate_t>. It's necessary to add these to the global locale, along with std::num_get<char16_t> and std::num_put<char16_t> (but it's not necessary to provide specialisations for these). The code for codecvt is the only bit that's difficult, and a reasonable template can be found in the GCC 5.0 libraries (if you use GCC 5, you don't need to provide the codecvt specialisation as it's already in the library).

Once you've done all of that, the char16_t streams will work correctly.

Then, every time you define a wide string, instead of L"string", write U16S("string"). Every time you pass a string to Xerces, write XS(string.c_str()) or U16XS("string") for literals. Every time you get a string back from Xerces, convert it back as u16string_t(US(call_xerces_function())).

Note that it is also possible to recompile Xerces-C with the character type set to char16_t. This removes a lot of the effort required above. BUT you won't be able to use any other library on the system that in turn depends on Xerces-C. Linking to any such library will cause link errors (because changing the character type changes many of the Xerces function signatures).



来源:https://stackoverflow.com/questions/25782247/xerces-c-and-cross-platform-string-literals

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!