Let\'s say I have two lists, l1
and l2
. I want to perform l1 - l2
, which returns all elements of l1
not in l2
Python has a language feature called List Comprehensions that is perfectly suited to making this sort of thing extremely easy. The following statement does exactly what you want and stores the result in l3
:
l3 = [x for x in l1 if x not in l2]
l3
will contain [1, 6]
.
use Set Comprehensions {x for x in l2} or set(l2) to get set, then use List Comprehensions to get list
l2set = set(l2)
l3 = [x for x in l1 if x not in l2set]
benchmark test code:
import time
l1 = list(range(1000*10 * 3))
l2 = list(range(1000*10 * 2))
l2set = {x for x in l2}
tic = time.time()
l3 = [x for x in l1 if x not in l2set]
toc = time.time()
diffset = toc-tic
print(diffset)
tic = time.time()
l3 = [x for x in l1 if x not in l2]
toc = time.time()
difflist = toc-tic
print(difflist)
print("speedup %fx"%(difflist/diffset))
benchmark test result:
0.0015058517456054688
3.968189239501953
speedup 2635.179227x
One way is to use sets:
>>> set([1,2,6,8]) - set([2,3,5,8])
set([1, 6])
Note, however, that sets do not preserve the order of elements, and cause any duplicated elements to be removed. The elements also need to be hashable. If these restrictions are tolerable, this may often be the simplest and highest performance option.
Alternate Solution :
reduce(lambda x,y : filter(lambda z: z!=y,x) ,[2,3,5,8],[1,2,6,8])
As an alternative, you may also use filter with the lambda expression to get the desired result. For example:
>>> l1 = [1,2,6,8]
>>> l2 = set([2,3,5,8])
# v `filter` returns the a iterator object. Here I'm type-casting
# v it to `list` in order to display the resultant value
>>> list(filter(lambda x: x not in l2, l1))
[1, 6]
Performance Comparison
Here I am comparing the performance of all the answers mentioned here. As expected, Arkku's set
based operation is fastest.
Arkku's Set Difference - First (0.124 usec per loop)
mquadri$ python -m timeit -s "l1 = set([1,2,6,8]); l2 = set([2,3,5,8]);" "l1 - l2"
10000000 loops, best of 3: 0.124 usec per loop
Daniel Pryden's List Comprehension with set lookup - Second (0.302 usec per loop)
mquadri$ python -m timeit -s "l1 = [1,2,6,8]; l2 = set([2,3,5,8]);" "[x for x in l1 if x not in l2]"
1000000 loops, best of 3: 0.302 usec per loop
Donut's List Comprehension on plain list - Third (0.552 usec per loop)
mquadri$ python -m timeit -s "l1 = [1,2,6,8]; l2 = [2,3,5,8];" "[x for x in l1 if x not in l2]"
1000000 loops, best of 3: 0.552 usec per loop
Moinuddin Quadri's using filter - Fourth (0.972 usec per loop)
mquadri$ python -m timeit -s "l1 = [1,2,6,8]; l2 = set([2,3,5,8]);" "filter(lambda x: x not in l2, l1)"
1000000 loops, best of 3: 0.972 usec per loop
Akshay Hazari's using combination of reduce + filter - Fifth (3.97 usec per loop)
mquadri$ python -m timeit "l1 = [1,2,6,8]; l2 = [2,3,5,8];" "reduce(lambda x,y : filter(lambda z: z!=y,x) ,l1,l2)"
100000 loops, best of 3: 3.97 usec per loop
PS: set does not maintain the order and removes the duplicate elements from the list. Hence, do not use set difference if you need any of these.
Expanding on Donut's answer and the other answers here, you can get even better results by using a generator comprehension instead of a list comprehension, and by using a set
data structure (since the in
operator is O(n) on a list but O(1) on a set).
So here's a function that would work for you:
def filter_list(full_list, excludes):
s = set(excludes)
return (x for x in full_list if x not in s)
The result will be an iterable that will lazily fetch the filtered list. If you need a real list object (e.g. if you need to do a len()
on the result), then you can easily build a list like so:
filtered_list = list(filter_list(full_list, excludes))