I have a large arrray of strings that looks something like this: String temp[] = new String[200000].
I have another String, let\'s call it bigtext. What I ne
I think you're looking for an algorithm like Rabin-Karp or Aho–Corasick which are designed to search in parallel for a large number of sub-strings in a text.
I'm afraid it's not efficient at all in any case!
To pick the right algorithm, you need to provide some answers:
bigText
known in advance? I guess temp
is not, from its name.Sticking to strict inclusion test, you might build a trie from your temp
array. It would prevent searching the same sub-string several times.
That is a very efficient approach. You can improve it slightly by only evaluating temp.length
once
for(int x = 0, len = temp.length; x < len; x++)
Although you don't provide enough detail of your program, it's quite possible you can find a more efficent approach by redesigning your program.
Note that your current complexity is O(|S1|*n)
, where |S1|
is the length of bigtext
and n
is the number of elements in your array, since each search is actually O(|S1|)
.
By building a suffix tree from bigtext
, and iterating on elements in the array, you could bring this complexity down to O(|S1| + |S2|*n)
, where |S2|
is the length of the longest string in the array. Assuming |S2| << |S1|
, it could be much faster!
Building a suffix tree is O(|S1|)
, and each search is O(|S2|)
. You don't have to go through bigtext
to find it, just on the relevant piece of the suffix tree. Since it is done n
times, you get total of O(|S1| + n*|S2|)
, which is asymptotically better then the naive implementation.
If you have additional information about temp
, you can maybe improve the iteration.
You can also reduce the time spent, if you parallelize the iteration.
Use a search algorithm like Boyer-Moore. Google Boyer Moore, and it has lots of links which explain how it works. For instance, there is a Java example.