I have discovered a disturbing inconsistency between std::string
and string literals in C++0x:
#include
#include
According to N3290 6.5.4, if the range is an array, boundary values are
initialized automatically without begin
/end
function dispatch.
So, how about preparing some wrapper like the following?
struct literal_t {
char const *b, *e;
literal_t( char const* b, char const* e ) : b( b ), e( e ) {}
char const* begin() const { return b; }
char const* end () const { return e; }
};
template< int N >
literal_t literal( char const (&a)[N] ) {
return literal_t( a, a + N - 1 );
};
Then the following code will be valid:
for (auto e : literal("hello")) ...
If your compiler provides user-defined literal, it might help to abbreviate:
literal operator"" _l( char const* p, std::size_t l ) {
return literal_t( p, p + l ); // l excludes '\0'
}
for (auto e : "hello"_l) ...
EDIT: The following will have smaller overhead (user-defined literal won't be available though).
template< size_t N >
char const (&literal( char const (&x)[ N ] ))[ N - 1 ] {
return (char const(&)[ N - 1 ]) x;
}
for (auto e : literal("hello")) ...
If we overloaded std::begin()
and std::end()
for const char arrays to return one less than the size of the array, then the following code would output 4 instead of the expected 5:
#include <iostream>
int main()
{
const char s[5] = {'h', 'e', 'l', 'l', 'o'};
int i = 0;
for (auto e : s)
++i;
std::cout << "Number of elements: " << i << '\n';
}
The inconsistency can be resolved using another tool in C++0x's toolbox: user-defined literals. Using an appropriately-defined user-defined literal:
std::string operator""s(const char* p, size_t n)
{
return string(p, n);
}
We'll be able to write:
int i = 0;
for (auto e : "hello"s)
++i;
std::cout << "Number of elements: " << i << '\n';
Which now outputs the expected number:
Number of elements: 5
With these new std::string literals, there is arguably no more reason to use C-style string literals, ever.
However, I think this is very undesirable: surely std::string and string literals should behave the same when it comes to properties as basic as their length?
String literals by definition have a (hidden) null character at the end of the string. Std::strings do not. Because std::strings have a length, that null character is a bit superfluous. The standard section on the string library explicitly allows non-null terminated strings.
Edit
I don't think I've ever given a more controversial answer in the sense of a huge amount of upvotes and a huge amount of downvotes.
The auto
iterator when applied to a C-style array iterates over each element of the array. The determination of the range is made at compile-time, not run time. This is ill-formed, for instance:
char * str;
for (auto c : str) {
do_something_with (c);
}
Some people use arrays of type char to hold arbitrary data. Yes, it is an old-style C way of thinking, and perhaps they should have used a C++-style std::array, but the construct is quite valid and quite useful. Those people would be rather upset if their auto iterator over a char buffer[1024];
stopped at element 15 just because that element happens to have the same value as the null character. An auto iterator over a Type buffer[1024];
will run all the way to the end. What makes a char array so worthy of a completely different implementation?
Note that if you want the auto iterator over a character array to stop early there is an easy mechanism to do that: Add a if (c == '0') break;
statement to the body of your loop.
Bottom line: There is no inconsistency here. The auto
iterator over a char[] array is consistent with how auto iterator work any other C-style array.
That you get 6
in the first case is an abstraction leak that couldn't be avoided in C. std::string
"fixes" that. For compatibility, the behaviour of C-style string literals does not change in C++.
For example, can std::begin() and std::end() be overloaded for character arrays so that the range they delimit does not include the terminating null character? If so, why was this not done?
Assuming access through a pointer (as opposed to char[N]
), only by embedding a variable inside the string containing the number of characters, so that seeking for NULL
isn't required any more. Oops! That's std::string
.
The way to "resolve the inconsistency" is not to use legacy features at all.
If you wanted the length, you should use strlen()
for the C string and .length()
for the C++ string. You can't treat C strings and C++ strings identically--they have different behavior.