Why use numpy over list based on speed?

后端 未结 1 596
挽巷
挽巷 2021-01-18 18:37

With reference to Why NumPy instead of Python lists?

tom10 said :

Speed: Here\'s a test on doing a sum over a list and a NumPy array, showing

相关标签:
1条回答
  • 2021-01-18 18:56

    To answer your question, yes. Appending to an array is an expensive operation, while lists make it relatively cheap (see Internals of Python list, access and resizing runtimes for why). However, that's no reason to abandon numpy. There are other ways to easily add data to a numpy array.

    There are surprising (to me, anyway) amount of ways to do this. Skip to the bottom to see benchmarks for each of them.

    Probably the most common is to simply pre-allocate the array, and index into that,

    #using preallocated numpy
    start = time.time()
    array = np.zeros(N)
    
    for i in range(N):
        array[i] = i
    
    end = time.time()
    print ("Using preallocated numpy: ", round(end-start, 5), end="\n")
    

    Of course, you can preallocate the memory for a list too, so lets include that for a benchmark comparison.

    #using preallocated list
    start = time.time()
    res = [None]*N
    
    for i in range(N):
        res[i] = i
    
    res = np.array(res)
    end = time.time()
    print ("Using preallocated list : ", round(end-start, 5), end="\n")
    

    Depending on your code, it may also be helpful to use numpy's fromiter function, which uses the results of an iterator to initialize the array.

    #using numpy fromiter shortcut
    start = time.time()
    
    res = np.fromiter(range(N), dtype='float64') # Use same dtype as other tests
    
    end = time.time()
    print ("Using fromiter : ", round(end-start, 5), end="\n")
    

    Of course, using a built in iterator isn't terribly flexible so let's try a custom iterator as well,

    #using custom iterator
    start = time.time()
    def it(N):
        i = 0
        while i < N:
            yield i
            i += 1
    
    res = np.fromiter(it(N), dtype='float64') # Use same dtype as other tests
    
    end = time.time()
    print ("Using custom iterator : ", round(end-start, 5), end="\n")
    

    That's two very flexible ways of using numpy. The first, using a preallocated array, is the most flexible. Let's see how they compare:

    Using numpy:  2.40017
    Using list :  0.0164
    Using preallocated numpy:  0.01604
    Using preallocated list :  0.01322
    Using fromiter :  0.00577
    Using custom iterator :  0.01458
    

    Right off, you can see that preallocating makes numpy much faster than using lists, although preallocating the list brings both to about the same speed. Using a builtin iterator is extremely fast, although the iterator performance drops back into the range of the preallocated array and list when a custom iterator is used.

    Converting code directly to numpy often has poor performance, as with append. Finding an approach using numpy's methods can almost always give a substantial improvement. In this case, preallocating the array or expressing the calculation of each element as an iterator to get similar performance to python lists. Or use a vanilla python list since the performance is about the same.

    EDITS: Original answer also included np.fromfunction. This was removed since it didn't fit the use case of adding one element at a time, fromfunction actually initializes the array and uses numpy's broadcasting to make a single function call. It is about a hundred times faster, so if you can solve your problem using broadcasting don't bother with these other methods.

    0 讨论(0)
提交回复
热议问题