问题
I have Python application.
There is list of 450 prohibited phrases. There is message got from user. I want to check, does this message contain any of this prohibited pharases. What is the fastest way to do that?
Currently I have this code:
message = "sometext"
lista = ["a","b","c"]
isContaining = false
for a, member in enumerate(lista):
if message.contains(lista[a]):
isContaining = true
break
Is there any faster way to do that? I need to handle message (max 500 chars) in less than 1 second.
回答1:
There is the any built-in function specially for that:
>>> message = "sometext"
>>> lista = ["a","b","c"]
>>> any(a in message for a in lista)
False
>>> lista = ["a","b","e"]
>>> any(a in message for a in lista)
True
Alternatively you could check the intersection of the sets:
>>> lista = ["a","b","c"]
>>> set(message) & set(lista)
set([])
>>> lista = ["a","b","e"]
>>> set(message) & set(lista)
set(['e'])
>>> set(['test','sentence'])&set(['this','is','my','sentence'])
set(['sentence'])
But you won't be able to check for subwords:
>>> set(['test','sentence'])&set(['this is my sentence'])
回答2:
Using regex compile from list
Consider memory and building time or expression, compile in advance.
lista = [...]
lista_escaped = [re.escape(item) for item in lista]
bad_match = re.compile('|'.join(lista_escaped))
is_bad = bad_match.search(message, re.IGNORECASE)
回答3:
I would combine the any
builtin with the in
operator:
isContaining = any(a in message for a in lista)
I don't know if this is the fastest way but it seems the simplest to me.
回答4:
We can also use set
intersection
method
>>> message = "sometext"
>>> lista = ["a","b","c"]
>>> isContaining = False
>>> if set(list(message)).intersection(set(lista)):
... isContaining = True
...
>>> isContaining
False
>>> message = "sometext a"
>>> list(message)
['s', 'o', 'm', 'e', 't', 'e', 'x', 't', ' ', 'a']
>>> if set(list(message)).intersection(set(lista)):
... isContaining = True
...
>>> isContaining
True
来源:https://stackoverflow.com/questions/27781506/fastest-way-to-check-does-string-contain-any-word-from-list