问题
Given two lists of words, dictionary
and sentence
,
I'm trying to create a binary representation based on the inclusion of words of dictionary
in the sentence
such as
[1,0,0,0,0,0,1,...,0]
where 1 indicates that the ith word in the dictionary shows up in the sentence.
What's the fastest way I can do this?
Example data:
dictionary = ['aardvark', 'apple','eat','I','like','maize','man','to','zebra', 'zed']
sentence = ['I', 'like', 'to', 'eat', 'apples']
result = [0,0,1,1,1,0,0,1,0,0]
Is there something faster than the following considering that I'm working with very large lists of approximately 56'000 elements in size?
x = [int(i in sentence) for i in dictionary]
回答1:
I would suggest something like this:
words = set(['hello','there']) #have the words available as a set
sentance = ['hello','monkey','theres','there']
rep = [ 1 if w in words else 0 for w in sentance ]
>>>
[1, 0, 0, 1]
I would take this approach because sets have O(1) lookup time, that to check if w
is in words
takes a constant time. This results in the list comprehension being O(n) as it must visit each word once. I believe this is close to or as efficient as you will get.
You also mentioned creating a 'Boolean' array, this would allow you to simply have the following instead:
rep = [ w in words for w in sentance ]
>>>
[True, False, False, True]
回答2:
set2 = set(list2)
x = [int(i in set2) for i in list1]
回答3:
use sets
, total time complexity O(N)
:
>>> sentence = ['I', 'like', 'to', 'eat', 'apples']
>>> dictionary = ['aardvark', 'apple','eat','I','like','maize','man','to','zebra', 'zed']
>>> s= set(sentence)
>>> [int(word in s) for word in dictionary]
[0, 0, 1, 1, 1, 0, 0, 1, 0, 0]
In case your sentence list contains actual sentences not words then try this:
>>> sentences= ["foobar foo", "spam eggs" ,"monty python"]
>>> words=["foo", "oof", "bar", "pyth" ,"spam"]
>>> from itertools import chain
# fetch words from each sentence and create a flattened set of all words
>>> s = set(chain(*(x.split() for x in sentences)))
>>> [int(x in s) for x in words]
[1, 0, 0, 0, 1]
来源:https://stackoverflow.com/questions/16393681/how-to-create-a-binary-list-based-on-inclusion-of-list-elements-in-another-list