Python coding relating to function any and “more than once” keyword

荒凉一梦 提交于 2019-12-08 13:39:56

问题


I have this simple piece of code that tells me if a word in a given list appears in an article:

 if not any(word in article.text for word in keywords):
        print("Skipping article as there is no matching keyword\n")

What I need is if at least 3 words in the "keywords" list appear in the article - if they don't then it should skip the article.

Is there an easy way to do this? I can't seem to find anything.


回答1:


If the set of keywords is large enough and the string being searched is long enough that it's often worth short-circuiting, a variation on other approaches that will stop when three hits are found (much like any stops when one hit found):

from itertools import islice

if sum(islice((1 for word in keywords if word in article.text), 3)) == 3:

Once you get three hits, it immediately stops iterating the keywords and the test passes.




回答2:


You can count the number of items that satisfy a condition using this pattern:

sum(1 for x in xs if c(x))

Here you would do:

if sum(1 for word in keywords if word in article.text) >= 3:
    # 



回答3:


My text and lists are pretty long

if the text is large and there are many keywords then you could use Aho-Corasick algorithm (like grep -Ff keywords.txt text.txt) e.g., if you want to find non-overlapping occurrences, you could use noaho package (not tested):

#!/usr/bin/env python
from itertools import islice
from noaho import NoAho  # $ pip install noaho

trie = NoAho()
for word in keywords:
    trie.add(word)
found_words = trie.findall_long(article.text)
if len(list(islice(found_words, 3))) == 3:
    print('at least 3 words in the "keywords" list appear in the article')


来源:https://stackoverflow.com/questions/34605144/python-coding-relating-to-function-any-and-more-than-once-keyword

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!