Inconsistency between std::string and string literals

后端 未结 6 1432
耶瑟儿~
耶瑟儿~ 2020-12-28 12:30

I have discovered a disturbing inconsistency between std::string and string literals in C++0x:

#include 
#include          


        
相关标签:
6条回答
  • 2020-12-28 13:00

    According to N3290 6.5.4, if the range is an array, boundary values are initialized automatically without begin/end function dispatch.
    So, how about preparing some wrapper like the following?

    struct literal_t {
        char const *b, *e;
        literal_t( char const* b, char const* e ) : b( b ), e( e ) {}
        char const* begin() const { return b; }
        char const* end  () const { return e; }
    };
    
    template< int N >
    literal_t literal( char const (&a)[N] ) {
        return literal_t( a, a + N - 1 );
    };
    

    Then the following code will be valid:

    for (auto e : literal("hello")) ...
    

    If your compiler provides user-defined literal, it might help to abbreviate:

    literal operator"" _l( char const* p, std::size_t l ) {
        return literal_t( p, p + l ); // l excludes '\0'
    }
    
    for (auto e : "hello"_l) ...
    

    EDIT: The following will have smaller overhead (user-defined literal won't be available though).

    template< size_t N >
    char const (&literal( char const (&x)[ N ] ))[ N - 1 ] {
        return (char const(&)[ N - 1 ]) x;
    }
    
    for (auto e : literal("hello")) ...
    
    0 讨论(0)
  • 2020-12-28 13:07

    If we overloaded std::begin() and std::end() for const char arrays to return one less than the size of the array, then the following code would output 4 instead of the expected 5:

    #include <iostream>
    
    int main()
    {
        const char s[5] = {'h', 'e', 'l', 'l', 'o'};
        int i = 0;
        for (auto e : s)
            ++i;
        std::cout << "Number of elements: " << i << '\n';
    }
    
    0 讨论(0)
  • 2020-12-28 13:14

    The inconsistency can be resolved using another tool in C++0x's toolbox: user-defined literals. Using an appropriately-defined user-defined literal:

    std::string operator""s(const char* p, size_t n)
    {
        return string(p, n);
    }
    

    We'll be able to write:

    int i = 0;     
    for (auto e : "hello"s)         
        ++i;     
    std::cout << "Number of elements: " << i << '\n';
    

    Which now outputs the expected number:

    Number of elements: 5
    

    With these new std::string literals, there is arguably no more reason to use C-style string literals, ever.

    0 讨论(0)
  • 2020-12-28 13:18

    However, I think this is very undesirable: surely std::string and string literals should behave the same when it comes to properties as basic as their length?

    String literals by definition have a (hidden) null character at the end of the string. Std::strings do not. Because std::strings have a length, that null character is a bit superfluous. The standard section on the string library explicitly allows non-null terminated strings.

    Edit
    I don't think I've ever given a more controversial answer in the sense of a huge amount of upvotes and a huge amount of downvotes.

    The auto iterator when applied to a C-style array iterates over each element of the array. The determination of the range is made at compile-time, not run time. This is ill-formed, for instance:

    char * str;
    for (auto c : str) {
       do_something_with (c);
    }
    

    Some people use arrays of type char to hold arbitrary data. Yes, it is an old-style C way of thinking, and perhaps they should have used a C++-style std::array, but the construct is quite valid and quite useful. Those people would be rather upset if their auto iterator over a char buffer[1024]; stopped at element 15 just because that element happens to have the same value as the null character. An auto iterator over a Type buffer[1024]; will run all the way to the end. What makes a char array so worthy of a completely different implementation?

    Note that if you want the auto iterator over a character array to stop early there is an easy mechanism to do that: Add a if (c == '0') break; statement to the body of your loop.

    Bottom line: There is no inconsistency here. The auto iterator over a char[] array is consistent with how auto iterator work any other C-style array.

    0 讨论(0)
  • 2020-12-28 13:19

    That you get 6 in the first case is an abstraction leak that couldn't be avoided in C. std::string "fixes" that. For compatibility, the behaviour of C-style string literals does not change in C++.

    For example, can std::begin() and std::end() be overloaded for character arrays so that the range they delimit does not include the terminating null character? If so, why was this not done?

    Assuming access through a pointer (as opposed to char[N]), only by embedding a variable inside the string containing the number of characters, so that seeking for NULL isn't required any more. Oops! That's std::string.

    The way to "resolve the inconsistency" is not to use legacy features at all.

    0 讨论(0)
  • 2020-12-28 13:20

    If you wanted the length, you should use strlen() for the C string and .length() for the C++ string. You can't treat C strings and C++ strings identically--they have different behavior.

    0 讨论(0)
提交回复
热议问题