Best and/or fastest way to create lists in python

前端 未结 4 1942
既然无缘
既然无缘 2020-11-29 00:31

In python, as far as I know, there are at least 3 to 4 ways to create and initialize lists of a given size:

Simple loop with append:

相关标签:
4条回答
  • 2020-11-29 01:08

    If you want to see the dependency with the length of the list n:

    Pure python

    enter image description here

    I tested for list length up to n=10000 and the behavior remains the same. So the integer multiplication method is the fastest with difference.

    Numpy

    For lists with more than ~300 elements you should consider numpy.

    enter image description here

    Benchmark code:

    import time
    
    def timeit(f):
    
        def timed(*args, **kwargs):
            start = time.clock()
            for _ in range(100):
                f(*args, **kwargs)
            end = time.clock()
            return end - start
        return timed
    
    @timeit
    def append_loop(n):
        """Simple loop with append"""
        my_list = []
        for i in xrange(n):
            my_list.append(0)
    
    @timeit
    def add_loop(n):
        """Simple loop with +="""
        my_list = []
        for i in xrange(n):
            my_list += [0]
    
    @timeit   
    def list_comprehension(n):        
        """List comprehension"""
        my_list = [0 for i in xrange(n)]
    
    @timeit
    def integer_multiplication(n):
        """List and integer multiplication"""
        my_list = [0] * n
    
    
    import numpy as np
    
    @timeit
    def numpy_array(n):
        my_list = np.zeros(n)
        
    
    import pandas as pd 
    
    df = pd.DataFrame([(integer_multiplication(n), numpy_array(n)) for n in range(1000)], 
                      columns=['Integer multiplication', 'Numpy array'])
    df.plot()
    

    Gist here.

    0 讨论(0)
  • 2020-11-29 01:11

    There is one more method which, while sounding weird, is handy in right curcumstances. If you need to produce the same list many times (initializing matrix for roguelike pathfinding and related stuff in my case), you can store a copy of the list in the tuple, then turn it to list when you need it. It is noticeably quicker than generating list via comprehensions and, unlike list multiplication, works with nested data structures.

    #  In class definition
    def __init__(self):
        self.l = [[1000 for x in range(1000)] for y in range(1000)]
        self.t = tuple(self.l)
    
    def some_method(self):
        self.l = list(self.t)
        self._do_fancy_computation()
        #  self.l is changed by this method
    
    #  Later in code:
    for a in range(10):
        obj.some_method()
    

    Voila, on every iteration you have a fresh copy of the same list in no time!

    Disclaimer:

    I do not have a slightest idea why is this so quick or whether it works anywhere outside CPython 3.4.

    0 讨论(0)
  • 2020-11-29 01:15

    If you want to create a list incrementing, i.e. adding 1 every time, use the range function. In range the start argument is included and the end argument is excluded as shown below:

    list(range(10,20))
    [10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
    

    If you want to create a list by adding 2 to previous elements use this:

    list(range(10,20,2))
    [10, 12, 14, 16, 18]
    

    Here the third argument is the step size to be taken. Now you can give any start element, end element and step size and create many lists fast and easy.

    Thank you..!

    Happy Learning.. :)

    0 讨论(0)
  • 2020-11-29 01:18

    Let's run some time tests* with timeit.timeit:

    >>> from timeit import timeit
    >>>
    >>> # Test 1
    >>> test = """
    ... my_list = []
    ... for i in xrange(50):
    ...     my_list.append(0)
    ... """
    >>> timeit(test)
    22.384258893239178
    >>>
    >>> # Test 2
    >>> test = """
    ... my_list = []
    ... for i in xrange(50):
    ...     my_list += [0]
    ... """
    >>> timeit(test)
    34.494779364416445
    >>>
    >>> # Test 3
    >>> test = "my_list = [0 for i in xrange(50)]"
    >>> timeit(test)
    9.490926919482774
    >>>
    >>> # Test 4
    >>> test = "my_list = [0] * 50"
    >>> timeit(test)
    1.5340533503559755
    >>>
    

    As you can see above, the last method is the fastest by far.


    However, it should only be used with immutable items (such as integers). This is because it will create a list with references to the same item.

    Below is a demonstration:

    >>> lst = [[]] * 3
    >>> lst
    [[], [], []]
    >>> # The ids of the items in `lst` are the same
    >>> id(lst[0])
    28734408
    >>> id(lst[1])
    28734408
    >>> id(lst[2])
    28734408
    >>>
    

    This behavior is very often undesirable and can lead to bugs in the code.

    If you have mutable items (such as lists), then you should use the still very fast list comprehension:

    >>> lst = [[] for _ in xrange(3)]
    >>> lst
    [[], [], []]
    >>> # The ids of the items in `lst` are different
    >>> id(lst[0])
    28796688
    >>> id(lst[1])
    28796648
    >>> id(lst[2])
    28736168
    >>>
    

    *Note: In all of the tests, I replaced range with xrange. Since the latter returns an iterator, it should always be faster than the former.

    0 讨论(0)
提交回复
热议问题