Python: how to check that if an item is in a list efficiently?

后端 未结 4 1321
盖世英雄少女心
盖世英雄少女心 2020-12-30 15:28

I have a list of strings (words like), and, while I am parsing a text, I need to check if a word belongs to the group of words of my current list.

However, my input

4条回答
  •  时光说笑
    2020-12-30 15:54

    There are two improvments you can make here.

    • Back your word list with a hashtable. This will afford you O(1) performance when you are checking if a word is present in your word list. There are a number of ways to do this; the most fitting in this scenario is to convert your list to a set.
    • Using a more appropriate structure for your matching-word collection.
      • If you need to store all of the matches in memory at the same time, use a dequeue, since its append performance is superior to lists.
      • If you don't need all the matches in memory at once, consider using a generator. A generator is used to iterate over matched values according to the logic you specify, but it only stores part of the resulting list in memory at a time. It may offer improved performance if you are experiencing I/O bottlenecks.

    Below is an example implementation based on my suggestions (opting for a generator, since I can't imagine you need all those words in memory at once).

    from itertools import chain
    d = set(['a','b','c']) # Load our dictionary
    f = open('c:\\input.txt','r')
    # Build a generator to get the words in the file
    all_words_generator = chain.from_iterable(line.split() for line in f)
    # Build a generator to filter out the non-dictionary words
    matching_words_generator = (word for word in all_words_generator if word in d)
    for matched_word in matching_words_generator:
        # Do something with matched_word
        print matched_word
    # We're reading the file during the above loop, so don't close it too early
    f.close()
    

    input.txt

    a b dog cat
    c dog poop
    maybe b cat
    dog
    

    Output

    a
    b
    c
    b
    

提交回复
热议问题