问题
I have this simple piece of code that tells me if a word in a given list appears in an article:
if not any(word in article.text for word in keywords):
print("Skipping article as there is no matching keyword\n")
What I need is if at least 3 words in the "keywords" list appear in the article - if they don't then it should skip the article.
Is there an easy way to do this? I can't seem to find anything.
回答1:
If the set of keywords is large enough and the string being searched is long enough that it's often worth short-circuiting, a variation on other approaches that will stop when three hits are found (much like any
stops when one hit found):
from itertools import islice
if sum(islice((1 for word in keywords if word in article.text), 3)) == 3:
Once you get three hits, it immediately stops iterating the keywords and the test passes.
回答2:
You can count the number of items that satisfy a condition using this pattern:
sum(1 for x in xs if c(x))
Here you would do:
if sum(1 for word in keywords if word in article.text) >= 3:
#
回答3:
My text and lists are pretty long
if the text is large and there are many keywords then you could use Aho-Corasick algorithm (like grep -Ff keywords.txt text.txt
) e.g., if you want to find non-overlapping occurrences, you could use noaho package (not tested):
#!/usr/bin/env python
from itertools import islice
from noaho import NoAho # $ pip install noaho
trie = NoAho()
for word in keywords:
trie.add(word)
found_words = trie.findall_long(article.text)
if len(list(islice(found_words, 3))) == 3:
print('at least 3 words in the "keywords" list appear in the article')
来源:https://stackoverflow.com/questions/34605144/python-coding-relating-to-function-any-and-more-than-once-keyword