How efficient is Python's 'in' or 'not in' operators?

问题

I have a list of over 100000 values and I am iterating over these values and checking if each one is contained in another list of random values (of the same size).

I am doing this by using if item[x] in randomList. How efficient is this? Does python do some sort of hashing for each container or is it internally doing a straight up search of the other container to find the element I am looking for?

Also, if it does this search linearly, then does it create a dictionary of the randomList and do the lookup with that?

回答1:

in is implemented by the __contains__ magic method of the object it applies to, so the efficiency is dependent upon that. For instance, set, dict and frozenset will be hash based lookups, while list will require a linear search. However, xrange (or range in Python 3.x) has a __contains__ method that doesn't require a linear search, but instead can use the start/stop/step information to determine a truthy value. (eg: 7 in xrange(4, 1000000) isn't done linearly).

Custom classes are free to implement __contains__ however they see fit but ideally should provide some information about how it does so in documentation if "not obvious".

回答2:

You will want to pre-convert your list to a set, where hashing can be used for O(1) lookup.

See http://wiki.python.org/moin/TimeComplexity

(Normally, you have to search every element in a 'classical' list to tell if something is in it (unless your data structure also keeps a set of all elements, but that would add a huge amount of time and space complexity, and the programmer can implement it themselves).)

回答3:

ninjagecko already answered the specific case of lists (and tuples FWIW), the general answer is : "it depends on the 'container' implementation" (I quote 'container' cause it can as well be a generator). For builtin types, you want sets or dicts for a fast lookup. Else you'd better check the doc or implementation for your container.

回答4:

You can do this without using a set if the two lists are sorted, as follows:

def gen(P1, P2):
    i, j = 0, 0
    I, J = len(P1), len(P2)
    while i != I and j != J:
        if P1[i] == P2[j]:
            yield P1[i]
            i, j = i + 1, j + 1
        else:
            if P1[i] < P2[j]:
                i += 1
            else:
                j += 1

This returns the intersection of P1 and P2:

>>> P1 = [1, 2, 3, 5, 90]
>>> P2 = [2, 3, 5, 30, 48]
>>> list(gen(P1, P2))
[2, 3, 5]

The algorithm is described here.

来源：https://stackoverflow.com/questions/17220296/how-efficient-is-pythons-in-or-not-in-operators

标签

python

iteration

complexity-theory