The Problem: A large static list of strings is provided as A
, A long string is provided as B
, strings in A
are all very short (a keywords
Assume you has all keywords of the same length (later you could extend this algo for different lengths)
I could suggest next:
precalculate some hash for each keyword (for example xor hash):
hash256 = reduce(int.__xor__, map(ord, keyword))
create a dictionary where key is a hash, and value list of corresponding keywords
save keyword length
curr_keyword = []
for x in B:
if len(curr_keyword) == keyword_length:
hash256 = reduce(int.__xor__, map(ord, curr_keyword))
if hash256 in dictionary_of_hashed:
#search in list
curr_keyword.append(x)
curr_keyword = curr_keyword[1:]
Something like this