Complexity analysis of filter() with a lambda function

后端 未结 2 1908
无人共我
无人共我 2021-01-13 13:13

Given two lists, list1 and list2

list3 = filter(lambda x: x in list1,list2)

This returns the intersection of the

2条回答
  •  执笔经年
    2021-01-13 13:44

    Your code does O(len(list1) * len(list2)) comparison operations for elements.

    • Your lambda function is executed O(len(list2)) times, once per each element filtered. See documentation for filter in Python 3 (Python 2):

      filter(function, iterable)

      Construct an iterator from those elements of iterable for which function returns true. iterable may be either a sequence, a container which supports iteration, or an iterator

      (emphasis mine)

      Clearly the function is called at least 1 time for each (distinct) element in iterable - knowing when not to need call it would mean also solving the Halting problem in general case, which not even the Python core developers are yet to solve ;-). In practice in CPython 3 the filter builtin creates an iterator which when advanced, executes the function once for each element (distinct or not) in the iteration order.

    • The x in list1 does O(len(list1)) comparisons in average and worst case, as documented.


    To speed it up, use a set; also you do not need a lambda function at all (use __contains__ magic method)

    list3 = filter(set(list1).__contains__, list2)
    

    This builds a set of the list1 once in O(len(list1)) time and runs the filter against it with O(len(list2)) average complexity for O(len(list1) + len(list2))


    If the ordering of elements of list2 does not matter then you can also do

    set(list1).intersection(list2)
    

    which should have lower constants than doing the filter above; and for really fast code, you should order the lists so that the smaller is turned into a set (since both intersection and set building have documented average complexity of O(n), but set building most probably will have larger constants due to resizing the set, thus it will make sense to build the set from the smaller to decrease the weight of these constants):

    smaller, larger = sorted([list1, list2], key=len)
    result = set(smaller).intersection(larger)
    

    Notice that Python 2 and 3 differ from each other. filter in Python 3 returns a generator, and the actual running time depends on the number of elements consumed from the resulting generator, whereas in Python 2 a list will be generated upfront, which might be more costly if you need only the first values.

提交回复
热议问题