String literal to basic_string

后端 未结 2 1126
故里飘歌
故里飘歌 2021-01-22 04:04

When it comes to internationalization & Unicode, I\'m an idiot American programmer. Here\'s the deal.

#include 
using namespace std;

typedef          


        
相关标签:
2条回答
  • 2021-01-22 04:34

    Using different character types for a different encodings has the advantages that the compiler barks at you when you mess them up. The downside is, you have to manually convert.

    A few helper functions to the rescue:

    inline ustring convert(const std::string& sys_enc) {
      return ustring( sys_enc.begin(), sys_enc.end() );
    }
    
    template< std::size_t N >
    inline ustring convert(const char (&array)[N]) {
      return ustring( array, array+N );
    }
    
    inline ustring convert(const char* pstr) {
      return ustring( reinterpret_cast<const ustring::value_type*>(pstr) );
    }
    

    Of course, all these fail silently and fatally when the string to convert contains anything other than ASCII.

    0 讨论(0)
  • 2021-01-22 04:53

    Narrow string literals are defined to be const char and there aren't unsigned string literals[1], so you'll have to cast:

    ustring s = reinterpret_cast<const unsigned char*>("Hello, UTF-8");
    

    Of course you can put that long thing into an inline function:

    inline const unsigned char *uc_str(const char *s){
      return reinterpret_cast<const unsigned char*>(s);
    }
    
    ustring s = uc_str("Hello, UTF-8");
    

    Or you can just use basic_string<char> and get away with it 99.9% of the time you're dealing with UTF-8.

    [1] Unless char is unsigned, but whether it is or not is implementation-defined, blah, blah.

    0 讨论(0)
提交回复
热议问题