High performance mass short string search in Python

前端 未结 5 1588
[愿得一人]
[愿得一人] 2021-02-05 13:55

The Problem: A large static list of strings is provided as A, A long string is provided as B, strings in A are all very short (a keywords

5条回答
  •  春和景丽
    2021-02-05 14:33

    Assume you has all keywords of the same length (later you could extend this algo for different lengths)

    I could suggest next:

    1. precalculate some hash for each keyword (for example xor hash):

      hash256 = reduce(int.__xor__, map(ord, keyword))
      
    2. create a dictionary where key is a hash, and value list of corresponding keywords

    3. save keyword length

      curr_keyword = []
      for x in B:
        if len(curr_keyword) == keyword_length:
           hash256 = reduce(int.__xor__, map(ord, curr_keyword))
           if hash256 in dictionary_of_hashed:
              #search in list
      
        curr_keyword.append(x)
        curr_keyword = curr_keyword[1:]
      

    Something like this

提交回复
热议问题