WChars, Encodings, Standards and Portability

后端 未结 4 1616
遇见更好的自我
遇见更好的自我 2020-11-22 09:11

The following may not qualify as a SO question; if it is out of bounds, please feel free to tell me to go away. The question here is basically, \"Do I understand the C stand

4条回答
  •  孤街浪徒
    2020-11-22 10:05

    Given that iconv is not "pure standard C/C++", I don't think you are satisfying your own specifications.

    There are new codecvt facets coming with char32_t and char16_t so I don't see how you can be wrong as long as you are consistent and pick one char type + encoding if the facets are here.

    The facets are described in 22.5 [locale.stdcvt] (from n3242).


    I don't understand how this doesn't satisfy at least some of your requirements:

    namespace ns {
    
    typedef char32_t char_t;
    using std::u32string;
    
    // or use user-defined literal
    #define LIT u32
    
    // Communicate with interface0, which wants utf-8
    
    // This type doesn't need to be public at all; I just refactored it.
    typedef std::wstring_convert, char_T> converter0;
    
    inline std::string
    to_interface0(string const& s)
    {
        return converter0().to_bytes(s);
    }
    
    inline string
    from_interface0(std::string const& s)
    {
        return converter0().from_bytes(s);
    }
    
    // Communitate with interface1, which wants utf-16
    
    // Doesn't have to be public either
    typedef std::wstring_convert, char_T> converter1;
    
    inline std::wstring
    to_interface0(string const& s)
    {
        return converter1().to_bytes(s);
    }
    
    inline string
    from_interface0(std::wstring const& s)
    {
        return converter1().from_bytes(s);
    }
    
    } // ns
    

    Then your code can use ns::string, ns::char_t, LIT'A' & LIT"Hello, World!" with reckless abandon, without knowing what's the underlying representation. Then use from_interfaceX(some_string) whenever it's needed. It doesn't affect the global locale or streams either. The helpers can be as clever as needed, e.g. codecvt_utf8 can deal with 'headers', which I assume is Standardese from tricky stuff like the BOM (ditto codecvt_utf16).

    In fact I wrote the above to be as short as possible but you'd really want helpers like this:

    template
    inline ns::string
    ns::from_interface0(T&&... t)
    {
        return converter0().from_bytes(std::forward(t)...);
    }
    

    which give you access to the 3 overloads for each [from|to]_bytes members, accepting things like e.g. const char* or ranges.

提交回复
热议问题