Python - find occurrences of list of strings within string

前端 未结 4 1539
不思量自难忘°
不思量自难忘° 2021-01-22 13:29

I have a large string and a list of search strings and want to build a boolean list indicating whether or not each of the search strings exists in the large string. What is the

4条回答
  •  星月不相逢
    2021-01-22 13:44

    An implementation using the Aho Corasick algorithm (https://pypi.python.org/pypi/pyahocorasick/), which uses a single pass through the string:

    import ahocorasick
    import numpy as np
    
    def check_strings(search_list, input):
        A = ahocorasick.Automaton()
        for idx, s in enumerate(search_list):
            A.add_word(s, (idx, s))
        A.make_automaton()
    
        index_list = []
        for item in A.iter(input):
            index_list.append(item[1][0])
    
        output_list = np.array([0] * len(search_list))
        output_list[index_list] = 1
        return output_list.tolist()
    
    search_strings = ["hello", "world", "goodbye"]
    test_string = "hello world"
    print(check_strings(search_strings, test_string))
    

提交回复
热议问题