Why is converting a list to a set faster than using just list to compute a list difference?

前端 未结 3 645
梦毁少年i
梦毁少年i 2021-01-02 05:36

Say, I wish to compute the difference of two lists C = A - B:

A = [1,2,3,4,5,6,7,8,9] 
B = [1,3,5,8,9]
C = [2,4,6,7]          #Result

相关标签:
3条回答
  • 2021-01-02 05:45

    According to the Python documentation on time complexity

    • List membership x in s is on average linear-time operation, or O(n).
    • Set membership x in s is on average constant-time operation, or O(1).

    Building a set is worst-case linear-time operation, because one would need to scan all the elements in a list to build a hash-table, so O(n). n is number of elements in a collection.

    The key observation is that, in Method 1, building a set, s = set(B) is just a one-off operation, then after that we just have n total number of set-membership test as in x not in B, so in total O(n) + n * O(1), or O(n) time complexity.

    Whereas in Method 2, the list-membership test x not in B is carried out for each element in A, so in total n * O(n) = O(n^2) time complexity.

    0 讨论(0)
  • 2021-01-02 05:46

    There is overhead to convert a list to a set, but a set is substantially faster than a list for those in tests.

    You can instantly see if item x is in set y because there's a hash table being used underneath. No matter how large your set is, the lookup time is the same (basically instantaneous) - this is known in Big-O notation as O(1). For a list, you have to individually check every element to see if item x is in list z. As your list grows, the check will take longer - this is O(n), meaning the length of the operation is directly tied to how long the list is.

    That increased speed can offset the set creation overhead, which is how your set check ends up being faster.

    EDIT: to answer that other question, Python has no way of determining that your list is sorted - not if you're using a standard list object, anyway. So it can't achieve O(log n) performance with a list comprehension. If you wanted to write your own binary search method which assumes the list is sorted, you can certainly do so, but O(1) beats O(log n) any day.

    0 讨论(0)
  • 2021-01-02 05:55

    Average time complexity for lookup (x in S) in a set is O(1) while the same for a list is O(n).

    You can check the details at https://wiki.python.org/moin/TimeComplexity

    0 讨论(0)
提交回复
热议问题