Performance comparison: insert vs build Python set operations

前端 未结 4 420
时光说笑
时光说笑 2021-02-02 17:14

In python, is it faster to a) Build a set from a list of n items b) Insert n items into a set?

I found this page (http://wiki.python.org/moin/TimeComplexity) but it did

相关标签:
4条回答
  • 2021-02-02 17:50

    In terms of O() complexity - it's definitely the same, because both approaches do exactly the same - insert n items into a set.

    The difference comes from implementation: One clear advantage of initialization from an iterable is that you save a lot of Python-level function calls - the initialization from a iterable is done wholly on the C level (**).

    Indeed, some tests on a list of 5,000,000 random integers shows that adding one by one is slower:

    lst = [random.random() for i in xrange(5000000)]
    set1 = set(lst)    # takes 2.4 seconds
    
    set2 = set()       # takes 3.37 seconds
    for item in lst:
        set2.add(item)
    

    (**) Looking inside the code of sets (Objects/setobject.c), eventually item insertion boils down to a call to set_add_key. When initializing from an iterable, this function is called in a tight C loop:

    while ((key = PyIter_Next(it)) != NULL) {
      if (set_add_key(so, key) == -1) {
        Py_DECREF(it);
        Py_DECREF(key);
        return -1;
      } 
      Py_DECREF(key);
    }
    

    On the other hand, each call to set.add invokes attribute lookup, which resolves to the C set_add function, which in turn calls set_add_key. Since the item addition itself is relatively quick (the hash table implementation of Python is very efficient), these extra calls all build up.

    0 讨论(0)
  • 2021-02-02 17:53

    On my Thinkpad:

    In [37]: timeit.timeit('for a in x: y.add(a)',
                           'y=set(); x=range(10000)', number=10000)
    Out[37]: 12.18006706237793
    
    In [38]: timeit.timeit('y=set(x)', 'y=set(); x=range(10000)', number=10000)
    Out[38]: 3.8137960433959961
    
    0 讨论(0)
  • 2021-02-02 18:01
    $ python -V
    Python 2.5.2
    $ python -m timeit -s "l = range(1000)" "set(l)"
    10000 loops, best of 3: 64.6 usec per loop
    $ python -m timeit -s "l = range(1000)" "s = set()" "for i in l:s.add(i)"
    1000 loops, best of 3: 307 usec per loop
    
    0 讨论(0)
  • 2021-02-02 18:06

    Here are the results from running the comparison using timeit. Seems initialization of set using list to be faster, curious to know why it is so:

    from timeit import timeit
    timeit("set(a)","a=range(10)")
    # 0.9944498532640864
    
    timeit("for i in a:x.add(i)","a=range(10);x=set()")
    # 1.6878826778265648
    

    Python version: 2.7

    0 讨论(0)
提交回复
热议问题