Here\'s a problem:
Given string A and a substring B, remove the first occurence of substring B in string A till it is possible to do so. Note that rem
Your approach has a pretty bad complexity. In a very bad case the string a
will be aaaaaaaaabbbbbbbbb
, and the string b
will be ab
, in which case you will need O(|a|)
searches, each taking O(|a| + |b|)
(assuming using some sophisticated search algorithm), resulting in a total complexity of O(|a|^2 + |a| * |b|)
, which with their constraints is years.
For their constraints a good complexity to aim for would be O(|a| * |b|)
, which is around 100 million operations, will finish in subsecond. Here's one way to approach it. For each position i
in the string a
let's compute the largest length n_i
, such that the a[i - n_i : i] = b[0 : n_i]
(in other words, the longest suffix of a
at that position which is a prefix of b
). We can compute it in O(|a| + |b|)
by using Knuth-Morris-Pratt algorithm.
After we have n_i
computed, finding the first occurrence of b
in a
is just a matter of finding the first n_i
that is equal to |b|
. This will be the right end of one of the occurrences of b
in a
.
Finally, we will need to modify Knuth-Morris-Pratt slightly. We will be logically removing occurrences of b
as soon as we compute an n_i
that is equal to |b|
. To account for the fact that some letters were removed from a
we will rely on the fact that Knuth-Morris-Pratt only relies on the last value of n_i
(and those computed for b
), and the current letter of a
, so we just need a fast way of retrieving the last value of n_i
after we logically remove an occurrence of b
. That can be done with a deque, that stores all the valid values of n_i
. Each value will be pushed into the deque once, and popped from it once, so that complexity of maintaining it is O(|a|)
, while the complexity of the Knuth-Morris-Pratt is O(|a| + |b|)
, resulting in O(|a| + |b|)
total complexity.
Here's a C++ implementation. It could have some off-by-one errors, but it works on your sample, and it flies for the worst case that I described at the beginning.
#include <deque>
#include <string>
#include <iostream>
#include <vector>
#include <algorithm>
using namespace std;
int main() {
string a, b;
cin >> a >> b;
size_t blen = b.size();
// make a = b$a
a = b + "$" + a;
vector<size_t> n(a.size()); // array for knuth-morris-pratt
vector<bool> removals(a.size()); // positions of right ends at which we remove `b`s
deque<size_t> lastN;
n[0] = 0;
// For the first blen + 1 iterations just do vanilla knuth-morris-pratt
for (size_t i = 1; i < blen + 1; ++ i) {
size_t z = n[i - 1];
while (z && a[i] != a[z]) {
z = n[z - 1];
}
if (a[i] != a[z]) n[i] = 0;
else n[i] = z + 1;
lastN.push_back(n[i]);
}
// For the remaining iterations some characters could have been logically
// removed from `a`, so use lastN to get last value of n instaed
// of actually getting it from `n[i - 1]`
for (size_t i = blen + 1; i < a.size(); ++ i) {
size_t z = lastN.back();
while (z && a[i] != a[z]) {
z = n[z - 1];
}
if (a[i] != a[z]) n[i] = 0;
else n[i] = z + 1;
if (n[i] == blen) // found a match
{
removals[i] = true;
// kill last |b| - 1 `n_i`s
for (size_t j = 0; j < blen - 1; ++ j) {
lastN.pop_back();
}
}
else {
lastN.push_back(n[i]);
}
}
string ret;
size_t toRemove = 0;
for (size_t pos = a.size() - 1; a[pos] != '$'; -- pos) {
if (removals[pos]) toRemove += blen;
if (toRemove) -- toRemove;
else ret.push_back(a[pos]);
}
reverse(ret.begin(), ret.end());
cout << ret << endl;
return 0;
}
[in] hehelllloworld
[in] hell
[out] oworld
[in] abababc
[in] ababc
[out] ab
[in] caaaaa ... aaaaaabbbbbb ... bbbbc
[in] ab
[out] cc