I have a large string and a list of search strings and want to build a boolean list indicating whether or not each of the search strings exists in the large string. What is the
An implementation using the Aho Corasick algorithm (https://pypi.python.org/pypi/pyahocorasick/), which uses a single pass through the string:
import ahocorasick
import numpy as np
def check_strings(search_list, input):
A = ahocorasick.Automaton()
for idx, s in enumerate(search_list):
A.add_word(s, (idx, s))
A.make_automaton()
index_list = []
for item in A.iter(input):
index_list.append(item[1][0])
output_list = np.array([0] * len(search_list))
output_list[index_list] = 1
return output_list.tolist()
search_strings = ["hello", "world", "goodbye"]
test_string = "hello world"
print(check_strings(search_strings, test_string))