Remove occurrences of substring recursively

后端 未结 1 1105
遇见更好的自我
遇见更好的自我 2021-01-14 17:23

Here\'s a problem:

Given string A and a substring B, remove the first occurence of substring B in string A till it is possible to do so. Note that rem

相关标签:
1条回答
  • 2021-01-14 17:54

    Your approach has a pretty bad complexity. In a very bad case the string a will be aaaaaaaaabbbbbbbbb, and the string b will be ab, in which case you will need O(|a|) searches, each taking O(|a| + |b|) (assuming using some sophisticated search algorithm), resulting in a total complexity of O(|a|^2 + |a| * |b|), which with their constraints is years.

    For their constraints a good complexity to aim for would be O(|a| * |b|), which is around 100 million operations, will finish in subsecond. Here's one way to approach it. For each position i in the string a let's compute the largest length n_i, such that the a[i - n_i : i] = b[0 : n_i] (in other words, the longest suffix of a at that position which is a prefix of b). We can compute it in O(|a| + |b|) by using Knuth-Morris-Pratt algorithm.

    After we have n_i computed, finding the first occurrence of b in a is just a matter of finding the first n_i that is equal to |b|. This will be the right end of one of the occurrences of b in a.

    Finally, we will need to modify Knuth-Morris-Pratt slightly. We will be logically removing occurrences of b as soon as we compute an n_i that is equal to |b|. To account for the fact that some letters were removed from a we will rely on the fact that Knuth-Morris-Pratt only relies on the last value of n_i (and those computed for b), and the current letter of a, so we just need a fast way of retrieving the last value of n_i after we logically remove an occurrence of b. That can be done with a deque, that stores all the valid values of n_i. Each value will be pushed into the deque once, and popped from it once, so that complexity of maintaining it is O(|a|), while the complexity of the Knuth-Morris-Pratt is O(|a| + |b|), resulting in O(|a| + |b|) total complexity.

    Here's a C++ implementation. It could have some off-by-one errors, but it works on your sample, and it flies for the worst case that I described at the beginning.

    #include <deque>
    #include <string>
    #include <iostream>
    #include <vector>
    #include <algorithm>
    
    using namespace std;
    
    int main() {
        string a, b;
        cin >> a >> b;
    
        size_t blen = b.size();
    
        // make a = b$a
        a = b + "$" + a;
    
        vector<size_t> n(a.size()); // array for knuth-morris-pratt
        vector<bool> removals(a.size()); // positions of right ends at which we remove `b`s
    
        deque<size_t> lastN;
        n[0] = 0;
    
        // For the first blen + 1 iterations just do vanilla knuth-morris-pratt
        for (size_t i = 1; i < blen + 1; ++ i) {
            size_t z = n[i - 1];
            while (z && a[i] != a[z]) {
                z = n[z - 1];
            }
            if (a[i] != a[z]) n[i] = 0;
            else n[i] = z + 1;
    
            lastN.push_back(n[i]);
        }
    
        // For the remaining iterations some characters could have been logically
        //     removed from `a`, so use lastN to get last value of n instaed
        //     of actually getting it from `n[i - 1]`
        for (size_t i = blen + 1; i < a.size(); ++ i) {
            size_t z = lastN.back();
            while (z && a[i] != a[z]) {
                z = n[z - 1];
            }
            if (a[i] != a[z]) n[i] = 0;
            else n[i] = z + 1;
    
            if (n[i] == blen) // found a match
            {
                removals[i] = true;
    
                // kill last |b| - 1 `n_i`s
                for (size_t j = 0; j < blen - 1; ++ j) {
                    lastN.pop_back();
                }
            }
            else {
                lastN.push_back(n[i]);
            }
        }
    
        string ret;
        size_t toRemove = 0;
        for (size_t pos = a.size() - 1; a[pos] != '$'; -- pos) {
            if (removals[pos]) toRemove += blen;
            if (toRemove) -- toRemove;
            else ret.push_back(a[pos]);
        }
        reverse(ret.begin(), ret.end());
    
        cout << ret << endl;
    
        return 0;
    }
    
    [in] hehelllloworld
    [in] hell
    [out] oworld
    
    [in] abababc
    [in] ababc
    [out] ab
    
    [in] caaaaa ... aaaaaabbbbbb ... bbbbc
    [in] ab
    [out] cc
    
    0 讨论(0)
提交回复
热议问题