Search for any word or combination of words from one string in a list (python)

大憨熊 提交于 2020-01-04 10:10:21

问题


I have a string (for example: "alpha beta charlie, delta&epsilon foxtrot") and a list (for example ["zero","omega virginia","apple beta charlie"]). Is there a convenient way to iterate through every word and combination of words in the string in order to search for it in the list?


回答1:


Purpose

You're saying combinations, but combinations are semantically unordered, what you mean, is you intend to find the intersection of all ordered permutations joined by spaces with a target list.

To begin with, we need to import the libraries we intend to use.

import re
import itertools

Splitting the string

Don't split on characters, you're doing a semantic search for words exclusive of strange characters. Regular expressions, powered by the re module are perfect for this. In a raw Python string, r'', we use the regular expression for the edge of a word, \b, around any alphanumeric character (and _), \w, of number greater than or equal to one, +.

re.findall returns a list of every match.

re_pattern = r'\b\w+\b'
silly_string = 'alpha beta charlie, delta&epsilon foxtrot'
words = re.findall(re_pattern, silly_string)

Here, words is our wordlist:

>>> print words
['alpha', 'beta', 'charlie', 'delta', 'epsilon', 'foxtrot']

Creating the Permutations

Continuing, we prefer to manipulate our data with generators to avoid unnecessarily materializing data before we need it and holding large datasets in memory. The itertools library has some nice functions that neatly suit our needs for providing all permutations of the above words and chaining them in a single iterable:

_gen = (itertools.permutations(words, i + 1) for i in xrange(len(words)))
all_permutations_gen = itertools.chain(*_gen)

listing all_permutations_gen with list(all_permutations_gen) would give us:

[('alpha',), ('beta',), ('charlie',), ('delta',), ('epsilon',), ('foxtrot',), ('alpha', 'beta'), ('alpha', 'charlie'), ('alpha', 'delta'), ('alpha', 'epsilon'), ('alpha', 'foxtrot'), ('beta', 'alpha'), ('beta', 'charlie'), ('beta', 'delta'), ('beta', 'epsilon'), ('beta', 'foxtrot'), ('charlie', 'alpha'), ('charlie', 'beta'), ('charlie', 'delta'), ('charlie', 'epsilon'), ('charlie', 'foxtrot'), ('delta', 'alpha'), ('delta', 'beta'), ('delta', 'charlie'), ('delta', 'epsilon'), ('delta', 'foxtrot'), ('epsilon', 'alpha'), ('epsilon', 'beta'), ('epsilon', 'charlie'), ('epsilon', 'delta'), ('epsilon', 'foxtrot'), ('foxtrot', 'alpha'), ('foxtrot', 'beta'), ('foxtrot', 'charlie'), ('foxtrot', 'delta'), ('foxtrot', 'epsilon'), ('alpha', 'beta', 'charlie'), ('alpha', 'beta', 'delta'), ...

If we materialized the generator in a list instead of a set, printing the first 20 items would show us:

>>> print all_permutations[:20] # this only works if you cast as a list instead
['alpha', 'beta', 'charlie', 'delta', 'epsilon', 'foxtrot', 'alpha beta', 'alpha charlie', 'alpha delta', 'alpha epsilon', 'alpha foxtrot', 'beta alpha', 'beta charlie', 'beta delta', 'beta epsilon', 'beta foxtrot', 'charlie alpha', 'charlie beta', 'charlie delta', 'charlie epsilon']

But that would exhaust the generator before we're ready. So instead, now we get the set of all permutations of those words

all_permutations = set(' '.join(i) for i in all_permutations_gen)

Checking for Membership of any Permutations in Target List

So we see with this we can now search for an intersection with the target list:

>>> target_list = ["zero","omega virginia","apple beta charlie"]
>>> all_permutations.intersection(target_list)
set([])

And in this case, for the examples given, we get the empty set, but if we have a string in the target that's in our set of permutations:

>>> target_list_2 = ["apple beta charlie", "foxtrot alpha beta charlie"]
>>> all_permutations.intersection(target_list_2)
set(['foxtrot alpha beta charlie'])


来源:https://stackoverflow.com/questions/14264163/search-for-any-word-or-combination-of-words-from-one-string-in-a-list-python

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!