问题
I have a list of over 100000 values and I am iterating over these values and checking if each one is contained in another list of random values (of the same size).
I am doing this by using if item[x] in randomList
.
How efficient is this? Does python do some sort of hashing for each container or is it internally doing a straight up search of the other container to find the element I am looking for?
Also, if it does this search linearly, then does it create a dictionary of the randomList and do the lookup with that?
回答1:
in
is implemented by the __contains__
magic method of the object it applies to, so the efficiency is dependent upon that. For instance, set
, dict
and frozenset
will be hash based lookups, while list
will require a linear search. However, xrange
(or range
in Python 3.x) has a __contains__
method that doesn't require a linear search, but instead can use the start/stop/step information to determine a truthy value. (eg: 7 in xrange(4, 1000000)
isn't done linearly).
Custom classes are free to implement __contains__
however they see fit but ideally should provide some information about how it does so in documentation if "not obvious".
回答2:
You will want to pre-convert your list to a set, where hashing can be used for O(1) lookup.
See http://wiki.python.org/moin/TimeComplexity
(Normally, you have to search every element in a 'classical' list to tell if something is in it (unless your data structure also keeps a set of all elements, but that would add a huge amount of time and space complexity, and the programmer can implement it themselves).)
回答3:
ninjagecko already answered the specific case of lists (and tuples FWIW), the general answer is : "it depends on the 'container' implementation" (I quote 'container' cause it can as well be a generator). For builtin types, you want sets or dicts for a fast lookup. Else you'd better check the doc or implementation for your container.
回答4:
You can do this without using a set if the two lists are sorted, as follows:
def gen(P1, P2):
i, j = 0, 0
I, J = len(P1), len(P2)
while i != I and j != J:
if P1[i] == P2[j]:
yield P1[i]
i, j = i + 1, j + 1
else:
if P1[i] < P2[j]:
i += 1
else:
j += 1
This returns the intersection of P1 and P2:
>>> P1 = [1, 2, 3, 5, 90]
>>> P2 = [2, 3, 5, 30, 48]
>>> list(gen(P1, P2))
[2, 3, 5]
The algorithm is described here.
来源:https://stackoverflow.com/questions/17220296/how-efficient-is-pythons-in-or-not-in-operators