I was recently in a C++ technical interview, where I was given a bit of simple string manipulation code, which is intended to take a string and return a string that is comprised
This implementation should be fast:
inline std::string first_last_n(std::string::size_type n, const std::string& s)
{
n = std::min(n, s.size());
std::string ret;
ret.reserve(2*n);
ret.append(s.begin(), s.begin() + n);
ret.append(s.end() - n, s.end());
return ret;
}
It passes all three unit tests.
When using GNU libstdc++, the line that declares & initializes ret
is extremely fast because libstdc++ uses a global "empty string" variable. Thus, it's simply a pointer copy. Calls to begin
and end
on s
are also fast because they will resolve to the const versions of begin
and end
, begin() const
and end() const
, so the internal representation of s
is not "leaked". With libstdc++, std::string::const_iterator
is const char*
, which is a pointer type and random access iterator. Thus, when std::string::append
calls std::distance to obtain the length of the input range, it is a pointer difference operation. Also, std::string::append
results in something like a memmove
. Finally, the reserve
operation ensures that enough memory is available for the return value.
EDIT:
For the curious, here is the initialization of ret
in the assembly output of MinGW g++ 4.5.0:
movl $__ZNSs4_Rep20_S_empty_rep_storageE+12, (%ebx)
It's simply copying the pointer to the global "empty representation".
EDIT2: Okay. I have now tested four variants with g++ 4.5.0 and Visual C++ 16.00.30319.01:
Variant 1 (the "c_str variant"):
inline std::string first_last_n(std::string::size_type n, const std::string& s)
{
std::string::size_type s_size = s.size();
n = std::min(n, s_size);
std::string ret;
ret.reserve(2*n);
const char *s_cStr = s.c_str(), *s_cStr_end = s_cStr + s_size;
ret.append(s_cStr, s_cStr + n);
ret.append(s_cStr_end - n, s_cStr_end);
return ret;
}
Variant 2 (the "data string" variant):
inline std::string first_last_n(std::string::size_type n, const std::string& s)
{
std::string::size_type s_size = s.size();
n = std::min(n, s_size);
std::string ret;
ret.reserve(2*n);
const char *s_data = s.data(), *s_data_end = s_data + s_size;
ret.append(s_data, s_data + n);
ret.append(s_data_end - n, s_data_end);
return ret;
}
Variant 3:
inline std::string first_last_n(std::string::size_type n, const std::string& s)
{
std::string::size_type s_size = s.size();
n = std::min(n, s_size);
std::string ret(s);
std::string::size_type d = s_size - n;
return ret.replace(n, d, s, d, n);
}
Variant 4 (my original code):
inline std::string first_last_n(std::string::size_type n, const std::string& s)
{
n = std::min(n, s.size());
std::string ret;
ret.reserve(2*n);
ret.append(s.begin(), s.begin() + n);
ret.append(s.end() - n, s.end());
return ret;
}
The results for g++ 4.5.0 are:
The results for VC++ 16.00.30319.01 are:
Unsurprisingly, the variant that is fastest depends on the compiler. However, not knowing which compiler will be used I think that my variant is best because it is a familiar style of C++, it is the fastest when using g++, and it is not that much slower than variants 1 or 2 when using VC++.
One thing interesting from the VC++ results is that using c_str
rather than data
is faster. Perhaps that is why your interviewer said that there is a faster way than your implementation.
EDIT3:
Actually, I just thought about another variant:
Variant 5:
inline std::string first_last_n(std::string::size_type n, const std::string& s)
{
n = std::min(n, s.size());
std::string ret;
ret.reserve(2*n);
std::string::const_iterator s_begin = s.begin(), s_end = s.end();
ret.append(s_begin, s_begin + n);
ret.append(s_end - n, s_end);
return ret;
}
It's just like variant 4 except that the begin and end iterators for s
are saved.
When variant 5 is tested, it actually beats out variant 2 (the data string variant) when using VC++: