How to define string literal with character type that depends on template parameter?

前端 未结 5 1227
不知归路 2021-02-06 07:47
class StringTraits {
    static const CharType NULL_CHAR = \'\\0\';
    static constexpr CharType* WHITESPACE_STR = \" \";


  • 2021-02-06 08:00

    I've just came up with a compact answer, which is similar to other C++17 versions. Similarly, it relies on implementation defined behavior, specifically on the environment character types. It supports converting ASCII and ISO-8859-1 to UTF-16 wchar_t, UTF-32 wchar_t, UTF-16 char16_t and UTF-32 char32_t. UTF-8 input is not supported, but more elaborate conversion code is feasible.

    template <typename Ch, size_t S>
    constexpr auto any_string(const char (&literal)[S]) -> const array<Ch, S> {
            array<Ch, S> r = {};
            for (size_t i = 0; i < S; i++)
                    r[i] = literal[i];
            return r;

    Full example follows:

    $ cat any_string.cpp 
    #include <array>
    #include <fstream>
    using namespace std;
    template <typename Ch, size_t S>
    constexpr auto any_string(const char (&literal)[S]) -> const array<Ch, S> {
            array<Ch, S> r = {};
            for (size_t i = 0; i < S; i++)
                    r[i] = literal[i];
            return r;
    int main(void)
        auto s = any_string<char>("Hello");
        auto ws = any_string<wchar_t>(", ");
        auto s16 = any_string<char16_t>("World");
        auto s32 = any_string<char32_t>("!\n");
        ofstream f("s.txt");
        f <<;
        wofstream wf("ws.txt");
        wf <<;
        basic_ofstream<char16_t> f16("s16.txt");
        f16 <<;
        basic_ofstream<char32_t> f32("s32.txt");
        f32 <<;
        return 0;
    $ c++ -o any_string any_string.cpp -std=c++17
    $ ./any_string 
    $ cat s.txt ws.txt s16.txt s32.txt 
    Hello, World!
    0 讨论(0)
  • 2021-02-06 08:11

    There are several ways to do this, depending on the available version of the C++ standard. If you have C++17 available, you can scroll down to Method 3, which is the most elegant solution in my opinion.

    Note: Methods 1 and 3 assume that the characters of the string literal will be restricted to 7-bit ASCII. This requires that characters are in the range [0..127] and the execution character set is compatible with 7-bit ASCII (e. g. Windows-1252 or UTF-8). Otherwise the simple casting of char values to wchar_t used by these methods won't give the correct result.

    Method 1 - aggregate initialization (C++03)

    The most simplest way is to define an array using aggregate initialization:

    template<typename CharType>
    class StringTraits {
        static const CharType NULL_CHAR = '\0';
        static constexpr CharType[] WHITESPACE_STR = {'a','b','c',0};

    Method 2 - template specialization and macro (C++03)

    (Another variant is shown in this answer.)

    The aggregate initialization method can be cumbersome for long strings. For more comfort, we can use a combination of template specialization and macros:

    template< typename CharT > constexpr CharT const* NarrowOrWide( char const*, wchar_t const* );
    template<> constexpr char const* NarrowOrWide< char >( char const* c, wchar_t const* )       
        { return c; }
    template<> constexpr wchar_t const* NarrowOrWide< wchar_t >( char const*, wchar_t const* w ) 
        { return w; }
    #define TOWSTRING1(x) L##x
    #define TOWSTRING(x) TOWSTRING1(x)  
    #define NARROW_OR_WIDE( C, STR ) NarrowOrWide< C >( ( STR ), TOWSTRING( STR ) )


    template<typename CharType>
    class StringTraits {
        static constexpr CharType const* WHITESPACE_STR = NARROW_OR_WIDE( CharType, " " );

    Live Demo at Coliru


    The template function NarrowOrWide() returns either the first (char const*) or the second (wchar_t const*) argument, depending on template parameter CharT.

    The macro NARROW_OR_WIDE is used to avoid having to write both the narrow and the wide string literal. The macro TOWSTRING simply prepends the L prefix to the given string literal.

    Of course the macro will only work if the range of characters is limited to basic ASCII, but this is usually sufficient. Otherwise one can use the NarrowOrWide() template function to define narrow and wide string literals separately.


    I would add a "unique" prefix to the macro names, something like the name of your library, to avoid conflicts with similar macros defined elsewhere.

    Method 3 - array initialized via template parameter pack (C++17)

    C++17 finally allows us to get rid of the macro and use a pure C++ solution. The solution uses template parameter pack expansion to initialize an array from a string literal while static_casting the individual characters to the desired type.

    First we declare a str_array class, which is similar to std::array but tailored for constant null-terminated string (e. g. str_array::size() returns number of characters without '\0', instead of buffer size). This wrapper class is necessary, because a plain array cannot be returned from a function. It must be wrapped in a struct or class.

    template< typename CharT, std::size_t Length >
    struct str_array
        constexpr CharT const* c_str()              const { return data_; }
        constexpr CharT const* data()               const { return data_; }
        constexpr CharT operator[]( std::size_t i ) const { return data_[ i ]; }
        constexpr CharT const* begin()              const { return data_; }
        constexpr CharT const* end()                const { return data_ + Length; }
        constexpr std::size_t size()                const { return Length; }
        // TODO: add more members of std::basic_string
        CharT data_[ Length + 1 ];  // +1 for null-terminator

    So far, nothing special. The real trickery is done by the following str_array_cast() function, which initializes the str_array from a string literal while static_casting the individual characters to the desired type:

    #include <utility>
    namespace detail {
        template< typename ResT, typename SrcT >
        constexpr ResT static_cast_ascii( SrcT x )
            if( !( x >= 0 && x <= 127 ) )
                throw std::out_of_range( "Character value must be in basic ASCII range (0..127)" );
            return static_cast<ResT>( x );
        template< typename ResElemT, typename SrcElemT, std::size_t N, std::size_t... I >
        constexpr str_array< ResElemT, N - 1 > do_str_array_cast( const SrcElemT(&a)[N], std::index_sequence<I...> )
            return { static_cast_ascii<ResElemT>( a[I] )..., 0 };
    } //namespace detail
    template< typename ResElemT, typename SrcElemT, std::size_t N, typename Indices = std::make_index_sequence< N - 1 > >
    constexpr str_array< ResElemT, N - 1 > str_array_cast( const SrcElemT(&a)[N] )
        return detail::do_str_array_cast< ResElemT >( a, Indices{} );

    The template parameter pack expansion trickery is required, because constant arrays can only be initialized via aggregate initialization (e. g. const str_array<char,3> = {'a','b','c',0};), so we have to "convert" the string literal to such an initializer list.

    The code triggers a compile time error if any character is outside of basic ASCII range (0..127), for the reasons given at the beginning of this answer. There are code pages where 0..127 doesn't map to ASCII, so this check does not give 100% safety though.


    template< typename CharT >
    struct StringTraits
        static constexpr auto WHITESPACE_STR = str_array_cast<CharT>( "abc" );
        // Fails to compile (as intended), because characters are not basic ASCII.
        //static constexpr auto WHITESPACE_STR1 = str_array_cast<CharT>( "äöü" );

    Live Demo at Coliru

    0 讨论(0)
  • 2021-02-06 08:20

    Here's an alternative implementation based on @zett42 's answer. Please advise me.

    #include <iostream>
    #include <tuple>
    #define TOWSTRING_(x) L##x
    #define TOWSTRING(x) TOWSTRING_(x)  
    #define MAKE_LPCTSTR(C, STR) (std::get<const C*>(std::tuple<const char*, const wchar_t*>(STR, TOWSTRING(STR))))
    template<typename CharType>
    class StringTraits {
        static constexpr const CharType* WHITESPACE_STR = MAKE_LPCTSTR(CharType, "abc");
    typedef StringTraits<char> AStringTraits;
    typedef StringTraits<wchar_t> WStringTraits;
    int main(int argc, char** argv) {
        std::cout << "Narrow string literal: " << AStringTraits::WHITESPACE_STR << std::endl;
        std::wcout << "Wide string literal  : " << WStringTraits::WHITESPACE_STR << std::endl;
        return 0;
    0 讨论(0)
  • 2021-02-06 08:24

    Here is a refinement of the now-common template-based solution which

    • preserves the array[len] C++ type of the C strings rather than decaying them to pointers, which means you can call sizeof() on the result and get the size of the string+NUL, not the size of a pointer, just as if you had the original string there.

    • Works even if the strings in different encodings have different length in code units (which is virtually guaranteed if the strings have non-ASCII text).

    • Does not incur any runtime overhead nor does it attempt/need to do encoding conversion at runtime.

    Credit: This refinement starts with the original template idea from Mark Ransom and #2 from zett42 and borrows some ideas from, but fixes the size limitations of, Chris Kushnir's answer.

    This code does char and wchar_t but it is trivial to extend it to char8_t+char16_t+char32_t

    // generic utility for C++ pre-processor concatenation
    // - avoids a pre-processor issue if x and y have macros inside
    #define _CPP_CONCAT(x, y) x ## y
    #define  CPP_CONCAT(x, y) _CPP_CONCAT(x, y)
    // now onto stringlit()
    template<size_t SZ0, size_t SZ1>
    auto  _stringlit(char c,
                     const char     (&s0)  [SZ0],
                     const wchar_t  (&s1)  [SZ1]) -> const char(&)[SZ0] 
        return s0;
    template<size_t SZ0, size_t SZ1>
    auto  _stringlit(wchar_t c,
                     const char     (&s0)  [SZ0],
                     const wchar_t  (&s1)  [SZ1]) -> const wchar_t(&)[SZ1] 
        return s1;
    #define stringlit(code_unit, lit) \
        _stringlit(code_unit (), lit, CPP_CONCAT(L, lit))

    Here we are not using C++ overloading but rather defining one function per char encoding, each function with different signatures. Each function returns the original array type with the original bounds. The selector that chooses the appropriate function is a single character in the desired encoding (value of that character not important). We cannot use the type itself in a template parameter to select because then we'd be overloading and have conflicting return types. This code also works without the constexpr. Note we are returning a reference to an array (which is possible in C++) not an array (which is not allowed in C++). The use of trailing return type syntax here is optional, but a heck of a lot more readable than the alternative, something like const char (&stringlit(...params here...))[SZ0] ugh.

    I compiled this with clang 9.0.8 and MSVC++ from Visual Studio 2019 16.7 (aka _MSC_VER 1927 aka pdb ver 14.27). I had c++2a/c++latest enabled, but I think C++14 or 17 is sufficient for this code.


    0 讨论(0)
  • 2021-02-06 08:25

    A variation of zett42 Method 2 above. Has the advantage of supporting all char types (for literals that can be represented as char[]) and preserving the proper string literal array type.

    First the template functions:

    template<typename CHAR_T>
    auto  LiteralChar(
        char     A,
        wchar_t  W,
        char8_t  U8,
        char16_t U16,
        char32_t U32
    )   -> CHAR_T
             if constexpr( std::is_same_v<CHAR_T, char> )      return A;
        else if constexpr( std::is_same_v<CHAR_T, wchar_t> )   return W;
        else if constexpr( std::is_same_v<CHAR_T, char8_t> )   return U8;
        else if constexpr( std::is_same_v<CHAR_T, char16_t> )  return U16;
        else if constexpr( std::is_same_v<CHAR_T, char32_t> )  return U32;
    template<typename CHAR_T, size_t SIZE>
    auto  LiteralStr(
        const char     (&A)  [SIZE],
        const wchar_t  (&W)  [SIZE],
        const char8_t  (&U8) [SIZE],
        const char16_t (&U16)[SIZE],
        const char32_t (&U32)[SIZE]
    )   -> const CHAR_T(&)[SIZE]
             if constexpr( std::is_same_v<CHAR_T, char> )      return A;
        else if constexpr( std::is_same_v<CHAR_T, wchar_t> )   return W;
        else if constexpr( std::is_same_v<CHAR_T, char8_t> )   return U8;
        else if constexpr( std::is_same_v<CHAR_T, char16_t> )  return U16;
        else if constexpr( std::is_same_v<CHAR_T, char32_t> )  return U32;

    Then the macros:

    #define  CMK_LC(CHAR_T, LITERAL) \
    LiteralChar<CHAR_T>( LITERAL, L ## LITERAL, u8 ## LITERAL, u ## LITERAL, U ## LITERAL )
    #define  CMK_LS(CHAR_T, LITERAL) \
    LiteralStr<CHAR_T>( LITERAL, L ## LITERAL, u8 ## LITERAL, u ## LITERAL, U ## LITERAL )

    Then use:

    template<typename CHAR_T>
    class StringTraits {
        struct  LC {  // literal character
            static  constexpr CHAR_T  Null  = CMK_LC(CHAR_T, '\0');
            static  constexpr CHAR_T  Space = CMK_LC(CHAR_T, ' ');
        struct  LS {  // literal string
            // can't seem to avoid having to specify the size
            static  constexpr CHAR_T  Space    [2] = CMK_LS(CHAR_T, " ");
            static  constexpr CHAR_T  Ellipsis [4] = CMK_LS(CHAR_T, "...");
    auto   char_space { StringTraits<char>::LC::Space }; 
    auto  wchar_space { StringTraits<wchar_t>::LC::Space };
    auto   char_ellipsis { StringTraits<char>::LS::Ellipsis };     // note: const char*
    auto  wchar_ellipsis { StringTraits<wchar_t>::LS::Ellipsis };  // note: const wchar_t*
    auto  (& char_space_array) [4] { StringTraits<char>::LS::Ellipsis };
    auto  (&wchar_space_array) [4] { StringTraits<wchar_t>::LS::Ellipsis };
    ? syntax to get a local copy ?

    Admittedly, the syntax to preserve the string literal array type is a bit of a burden, but not overly so. Again, only works for literals that have the same # of code units in all char type representations. If you want LiteralStr to support all literals for all types would likely need to pass pointers as param and return CHAR_T* instead of CHAR_T(&)[SIZE]. Don't think can get LiteralChar to support multibyte char.


    Applying Louis Semprini SIZE support to LiteralStr gives:

    template<typename CHAR_T, 
        size_t SIZE_A, size_t SIZE_W, size_t SIZE_U8, size_t SIZE_U16, size_t SIZE_U32,
        size_t SIZE_R =
            std::is_same_v<CHAR_T, char>     ? SIZE_A   :
            std::is_same_v<CHAR_T, wchar_t>  ? SIZE_W   :
            std::is_same_v<CHAR_T, char8_t>  ? SIZE_U8  :
            std::is_same_v<CHAR_T, char16_t> ? SIZE_U16 :
            std::is_same_v<CHAR_T, char32_t> ? SIZE_U32 : 0
    auto  LiteralStr(
        const char     (&A)   [SIZE_A],
        const wchar_t  (&W)   [SIZE_W],
        const char8_t  (&U8)  [SIZE_U8],
        const char16_t (&U16) [SIZE_U16],
        const char32_t (&U32) [SIZE_U32]
    )   -> const CHAR_T(&)[SIZE_R]
             if constexpr( std::is_same_v<CHAR_T, char> )      return A;
        else if constexpr( std::is_same_v<CHAR_T, wchar_t> )   return W;
        else if constexpr( std::is_same_v<CHAR_T, char8_t> )   return U8;
        else if constexpr( std::is_same_v<CHAR_T, char16_t> )  return U16;
        else if constexpr( std::is_same_v<CHAR_T, char32_t> )  return U32;

    It is also possible to use a simpler syntax to create variables; for example, in StringTraits::LS can change to constexpr auto & so

    static  constexpr CHAR_T  Ellipsis [4] = CMK_LS(CHAR_T, "...");


    static  constexpr auto & Ellipsis { CMK_LS(CHAR_T, "...") };

    When using CMK_LS(char, "literal") any invalid char in literal are converted to '?' by VS 2019, not sure what other compilers do.

    0 讨论(0)