High performance mass short string search in Python

前端 未结 5 1586
[愿得一人]
[愿得一人] 2021-02-05 13:55

The Problem: A large static list of strings is provided as A, A long string is provided as B, strings in A are all very short (a keywords

5条回答
  •  长情又很酷
    2021-02-05 14:32

    Depending on how long your long string is, it may be worth it to do something like this:

    ls = 'my long string of stuff'
    #Generate all possible substrings of ls, keeping only uniques
    x = set([ls[p:y] for p in range(0, len(ls)+1) for y in range(p+1, len(ls)+1)])
    
    result = []
    for word in A:
        if word in x:
            result.append(word)
    

    Obviously if your long string is very, very long then this also becomes too slow, but it should be faster for any string under a few hundred characters.

提交回复
热议问题