c_str() vs. data() when it comes to return type

半腔热情 提交于 2019-12-03 04:13:48
Kerrek SB

The new overload was added by P0272R1 for C++17. Neither the paper itself nor the links therein discuss why only data was given new overloads but c_str was not. We can only speculate at this point (unless people involved in the discussion chime in), but I'd like to offer the following points for consideration:

  • Even just adding the overload to data broke some code; keeping this change conservative was a way to minimize negative impact.

  • The c_str function had so far been entirely identical to data and is effectively a "legacy" facility for interfacing code that takes "C string", i.e. an immutable, null-terminated char array. Since you can always replace c_str by data, there's no particular reason to add to this legacy interface.

I realize that the very motivation for P0292R1 was that there do exist legacy APIs that erroneously or for C reasons take only mutable pointers even though they don't mutate. All the same, I suppose we don't want to add more to string's already massive API that absolutely necessary.

One more point: as of C++17 you are now allowed to write to the null terminator, as long as you write the value zero. (Previously, it used to be UB to write anything to the null terminator.) A mutable c_str would create yet another entry point into this particular subtlety, and the fewer subtleties we have, the better.

P.W

The reason why the data() member got an overload is explained in this paper at open-std.org.

TL;DR of the paper: The non-const .data() member function for std::string was added to improve uniformity in the standard library and to help C++ developers write correct code. It is also convenient when calling a C-library function that doesn't have const qualification on its C-string parameters.

Some relevant passages from the paper:

Abstract
Is std::string's lack of a non-const .data() member function an oversight or an intentional design based on pre-C++11 std::string semantics? In either case, this lack of functionality tempts developers to use unsafe alternatives in several legitimate scenarios. This paper argues for the addition of a non-const .data() member function for std::string to improve uniformity in the standard library and to help C++ developers write correct code.

Use Cases
C libraries occasionally include routines that have char * parameters. One example is the lpCommandLine parameter of the CreateProcess function in the Windows API. Because the data() member of std::string is const, it cannot be used to make std::string objects work with the lpCommandLine parameter. Developers are tempted to use .front() instead, as in the following example.

std::string programName;
// ...
if( CreateProcess( NULL, &programName.front(), /* etc. */ ) ) {
  // etc.
} else {
  // handle error
}

Note that when programName is empty, the programName.front() expression causes undefined behavior. A temporary empty C-string fixes the bug.

std::string programName;
// ...

if( !programName.empty() ) { 
  char emptyString[] = {'\0'};    
  if( CreateProcess( NULL, programName.empty() ? emptyString : &programName.front(), /* etc. */ ) ) {
    // etc.
  } else {
    // handle error
  }
}

If there were a non-const .data() member, as there is with std::vector, the correct code would be straightforward.

std::string programName;
// ...
if( !programName.empty() ) {
  char emptyString[] = {'\0'};
  if( CreateProcess( NULL, programName.data(), /* etc. */ ) ) {
    // etc.
  } else {
    // handle error
  }
}

A non-const .data() std::string member function is also convenient when calling a C-library function that doesn't have const qualification on its C-string parameters. This is common in older codes and those that need to be portable with older C compilers.

The Quantum Physicist

It just depends on the semantics of "what you want to do with it". Generally speaking, std::string is sometimes used as a buffer vector, i.e., as a replacement to std::vector<char>. This can be seen in boost::asio often. In other words, it's an array of characters.

c_str(): strictly means that you're looking for a null-terminated string. In that sense, you should never modify the data and you should never need the string as a non-const.

data(): you may need the information inside the string as buffer data, and even as non-const. You may or may not need to modify the data, which you can do, as long as it doesn't involve changing the length of the string.

CAF

The two member functions c_str and data of std::string exist due to the history of the std::string class.

Until C++11, a std::string could have been implemented as copy-on-write. The internal representation did not need any null termination of the stored string. The member function c_str made sure the returned string was null terminated. The member function data simlpy returned a pointer to the stored string, that was not necessarily null terminated. - To be sure that changes to the string were noticed to enable copy-on-write, both functions needed to return a pointer to const data.

This all changed with C++11 when copy-on-write was no longer allowed for std::string. Since c_str was still required to deliver a null terminated string, the null is always appended to the actual stored string. Otherwise a call to c_str may need to change the stored data to make the string null terminated which would make c_str a non-const function. Since data delivers a pointer to the stored string, it usually has the same implementation as c_str. Both functions still exists due to backward compatibility.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!