Given two lists, list1
and list2
list3 = filter(lambda x: x in list1,list2)
This returns the intersection of the
Your code does O(len(list1) * len(list2))
comparison operations for elements.
Your lambda function is executed O(len(list2))
times, once per each element filtered. See documentation for filter in Python 3 (Python 2):
filter(function, iterable)
Construct an
iterator
from those elements ofiterable
for which function returns true.iterable
may be either a sequence, a container which supports iteration, or an iterator
(emphasis mine)
Clearly the function is called at least 1 time for each (distinct) element in iterable - knowing when not to need call it would mean also solving the Halting problem in general case, which not even the Python core developers are yet to solve ;-). In practice in CPython 3 the filter builtin creates an iterator which when advanced, executes the function once for each element (distinct or not) in the iteration order.
The x in list1 does O(len(list1))
comparisons in average and worst case, as documented.
To speed it up, use a set
; also you do not need a lambda function at all (use __contains__
magic method)
list3 = filter(set(list1).__contains__, list2)
This builds a set
of the list1
once in O(len(list1))
time and runs the filter against it with O(len(list2))
average complexity for O(len(list1) + len(list2))
If the ordering of elements of list2
does not matter then you can also do
set(list1).intersection(list2)
which should have lower constants than doing the filter
above; and for really fast code, you should order the lists so that the smaller is turned into a set (since both intersection and set building have documented average complexity of O(n)
, but set building most probably will have larger constants due to resizing the set
, thus it will make sense to build the set from the smaller to decrease the weight of these constants):
smaller, larger = sorted([list1, list2], key=len)
result = set(smaller).intersection(larger)
Notice that Python 2 and 3 differ from each other. filter
in Python 3 returns a generator, and the actual running time depends on the number of elements consumed from the resulting generator, whereas in Python 2 a list will be generated upfront, which might be more costly if you need only the first values.