How can I implement an efficient whole-word string replacement in C++ without regular expressions?

前端 未结 2 1847
耶瑟儿~
耶瑟儿~ 2020-12-19 11:44

Perhaps I\'m overlooking something obvious but I was wondering what the fastest way to implement whole-word string replacement in C++ might be. At first I considered simply

相关标签:
2条回答
  • 2020-12-19 12:12

    This is my quick response, but don't know about how quick the solution is... There are few solutions to this problem:
    1. By using iterators, compare each word (delimited by space), recreating string for each occurrence:

    string& remove_all_occurences(string& s, const string& str_to_remove, const string& str_to_put){
                    typedef string::size_type string_size;
                    string_size i = 0;
                    string cur_string;
                    cur_string.reserve(s.size());
    
                    // invariant: we have processed characters [original value of i, i) 
                    while (i != s.size()) {
                    // ignore leading blanks
                    // invariant: characters in range [original i, current i) are all spaces
                        while (i != s.size() && isspace(s[i]))
                        ++i;
    
                        // find end of next word
                        string_size j = i;
                        // invariant: none of the characters in range [original j, current j)is a space
                         while (j != s.size() && !isspace(s[j]))
                            j++;
                            // if we found some nonwhitespace characters 
    
    
                        if (i != j) {
                            // copy from s starting at the beginning to i, placing str to replace, and finishing with j to the end of s
                            cur_string = s.substr(i,j-i);
                            if(cur_string == str_to_remove){
                                s = s.substr(0,i) + str_to_put + s.substr(j,s.size() - j);
                            }
                            i = j;
                        }
                    }
                    return s;
                }
    

    Testing the program:

    void call_remove_all_occurences(){
                    string my_str = "The quick brown fox jumps over sleepy dog fox fox fox";
                    cout << remove_all_occurences(my_str,"fox","godzilla") << endl;
                }
    

    Output:

    The quick brown godzilla jumps over sleepy dog godzilla godzilla godzilla
    
    1. By splitting string into vector and than going through vector and replacing each occurence - simple... don't have the code, but you get the idea...
    0 讨论(0)
  • 2020-12-19 12:14

    I think you can do this, both doing whole-word matching and doing it efficiently. The key is to:

    • detect "whole-word" boundaries using 'std::isalpha', which should work with Unicode & any locale.
    • do the replace "out of place" by creating a separate 'output' string that you swap with 'input' at the end of processing, instead of doing the work "in place" on the 'input' string itself.

    Here's my take on your function:

    #include <cctype> // isalpha
    #include <ciso646> // or, not
    #include <string> // wstring
    
    using std::size_t;
    using std::wstring;
    
    /// @brief Do a "find and replace" on a string.
    /// @note This function does "whole-word" matching.
    /// @param[in,out] input_string The string to operate on.
    /// @param[in] find_string The string to find in the input.
    /// @param[in] replace_string The string to replace 'find_string'
    ///            with in the input.
    void find_and_replace( wstring& input_string,
                           const wstring& find_string,
                           const wstring& replace_string )
    {
      if( find_string.empty()
          or find_string == replace_string
          or input_string.length() < find_string.length() )
      {
        return;
      }
    
      wstring output_string;
      output_string.reserve( input_string.length() );
      size_t last_pos = 0u;
      for( size_t new_pos = input_string.find( find_string );
           new_pos != wstring::npos;
           new_pos = input_string.find( find_string, new_pos ) )
      {
        bool did_replace = false;
        if( ( new_pos == 0u
              or not std::isalpha( input_string.at( new_pos - 1u ) ) )
            and ( new_pos + find_string.length() == input_string.length()
                  or not std::isalpha( input_string.at( new_pos + find_string.length() ) ) ) )
        {
          output_string.append( input_string, last_pos, new_pos - last_pos );
          output_string.append( replace_string );
          did_replace = true;
        }
        new_pos += find_string.length();
        if( did_replace )
        {
          last_pos = new_pos;
        }
      }
      output_string.append( input_string, last_pos,
                            input_string.length() - last_pos );
    
      input_string.swap( output_string );
    }
    

    P.S. I was unsure what 'replace_all' was trying to accomplish in your initial example, so I removed it from my solution for clarity.

    P.P.S. This code would be much cleaner with Regex-es. Can you rely on C++ TR1 or C++ 2011 functionality? They provide a standard 'regex' library.

    0 讨论(0)
提交回复
热议问题