When it comes to internationalization & Unicode, I\'m an idiot American programmer. Here\'s the deal.
#include
using namespace std;
typedef
Using different character types for a different encodings has the advantages that the compiler barks at you when you mess them up. The downside is, you have to manually convert.
A few helper functions to the rescue:
inline ustring convert(const std::string& sys_enc) {
return ustring( sys_enc.begin(), sys_enc.end() );
}
template< std::size_t N >
inline ustring convert(const char (&array)[N]) {
return ustring( array, array+N );
}
inline ustring convert(const char* pstr) {
return ustring( reinterpret_cast<const ustring::value_type*>(pstr) );
}
Of course, all these fail silently and fatally when the string to convert contains anything other than ASCII.
Narrow string literals are defined to be const char
and there aren't unsigned string literals[1], so you'll have to cast:
ustring s = reinterpret_cast<const unsigned char*>("Hello, UTF-8");
Of course you can put that long thing into an inline function:
inline const unsigned char *uc_str(const char *s){
return reinterpret_cast<const unsigned char*>(s);
}
ustring s = uc_str("Hello, UTF-8");
Or you can just use basic_string<char>
and get away with it 99.9% of the time you're dealing with UTF-8.
[1] Unless char
is unsigned, but whether it is or not is implementation-defined, blah, blah.