C++ String Interview Question

后端 未结 7 1204
悲哀的现实
悲哀的现实 2021-02-13 18:22

I was recently in a C++ technical interview, where I was given a bit of simple string manipulation code, which is intended to take a string and return a string that is comprised

相关标签:
7条回答
  • 2021-02-13 19:18

    This implementation should be fast:

    inline std::string first_last_n(std::string::size_type n, const std::string& s)
    {
        n = std::min(n, s.size());
        std::string ret;
        ret.reserve(2*n);
        ret.append(s.begin(), s.begin() + n);
        ret.append(s.end() - n, s.end());
        return ret;
    }
    

    It passes all three unit tests.

    When using GNU libstdc++, the line that declares & initializes ret is extremely fast because libstdc++ uses a global "empty string" variable. Thus, it's simply a pointer copy. Calls to begin and end on s are also fast because they will resolve to the const versions of begin and end, begin() const and end() const, so the internal representation of s is not "leaked". With libstdc++, std::string::const_iterator is const char*, which is a pointer type and random access iterator. Thus, when std::string::append<const char*>(const char*, const char*) calls std::distance to obtain the length of the input range, it is a pointer difference operation. Also, std::string::append<const char*>(const char*, const char*) results in something like a memmove. Finally, the reserve operation ensures that enough memory is available for the return value.

    EDIT: For the curious, here is the initialization of ret in the assembly output of MinGW g++ 4.5.0:

        movl    $__ZNSs4_Rep20_S_empty_rep_storageE+12, (%ebx)
    

    It's simply copying the pointer to the global "empty representation".

    EDIT2: Okay. I have now tested four variants with g++ 4.5.0 and Visual C++ 16.00.30319.01:

    Variant 1 (the "c_str variant"):

    inline std::string first_last_n(std::string::size_type n, const std::string& s)
    {
       std::string::size_type s_size = s.size();
       n = std::min(n, s_size);
       std::string ret;
       ret.reserve(2*n);
       const char *s_cStr = s.c_str(), *s_cStr_end = s_cStr + s_size;
       ret.append(s_cStr, s_cStr + n);
       ret.append(s_cStr_end - n, s_cStr_end);
       return ret;
    }
    

    Variant 2 (the "data string" variant):

    inline std::string first_last_n(std::string::size_type n, const std::string& s)
    {
       std::string::size_type s_size = s.size();
       n = std::min(n, s_size);
       std::string ret;
       ret.reserve(2*n);
       const char *s_data = s.data(), *s_data_end = s_data + s_size;
       ret.append(s_data, s_data + n);
       ret.append(s_data_end - n, s_data_end);
       return ret;
    }
    

    Variant 3:

    inline std::string first_last_n(std::string::size_type n, const std::string& s)
    {
       std::string::size_type s_size = s.size();
       n = std::min(n, s_size);
       std::string ret(s);
       std::string::size_type d = s_size - n;
       return ret.replace(n, d, s, d, n);
    }
    

    Variant 4 (my original code):

    inline std::string first_last_n(std::string::size_type n, const std::string& s)
    {
       n = std::min(n, s.size());
       std::string ret;
       ret.reserve(2*n);
       ret.append(s.begin(), s.begin() + n);
       ret.append(s.end() - n, s.end());
       return ret;
    }
    

    The results for g++ 4.5.0 are:

    • Variant 4 is the fastest
    • Variant 3 is second (5% slower than variant 4)
    • Variant 1 is third (2% slower than variant 3)
    • Variant 2 is fourth (0.2% slower than variant 1)

    The results for VC++ 16.00.30319.01 are:

    • Variant 1 is the fastest
    • Variant 2 is second (3% slower than variant 1)
    • Variant 4 is third (4% slower than variant 2)
    • Variant 3 is fourth (17% slower than variant 4)

    Unsurprisingly, the variant that is fastest depends on the compiler. However, not knowing which compiler will be used I think that my variant is best because it is a familiar style of C++, it is the fastest when using g++, and it is not that much slower than variants 1 or 2 when using VC++.

    One thing interesting from the VC++ results is that using c_str rather than data is faster. Perhaps that is why your interviewer said that there is a faster way than your implementation.

    EDIT3:

    Actually, I just thought about another variant:

    Variant 5:

    inline std::string first_last_n(std::string::size_type n, const std::string& s)
    {
       n = std::min(n, s.size());
       std::string ret;
       ret.reserve(2*n);
       std::string::const_iterator s_begin = s.begin(), s_end = s.end();
       ret.append(s_begin, s_begin + n);
       ret.append(s_end - n, s_end);
       return ret;
    }
    

    It's just like variant 4 except that the begin and end iterators for s are saved.

    When variant 5 is tested, it actually beats out variant 2 (the data string variant) when using VC++:

    • Variant 1 is the fastest
    • Variant 5 is second (1.6% slower than variant 1)
    • Variant 2 is third (1.4% slower than variant 5)
    • Variant 4 is third (4% slower than variant 2)
    • Variant 3 is fourth (17% slower than variant 4)
    0 讨论(0)
提交回复
热议问题