Most efficient way to map function over numpy array

前端 未结 11 1349
庸人自扰
庸人自扰 2020-11-22 02:13

What is the most efficient way to map a function over a numpy array? The way I\'ve been doing it in my current project is as follows:

import numpy as np 

x          


        
11条回答
  •  走了就别回头了
    2020-11-22 02:34

    TL;DR

    As noted by @user2357112, a "direct" method of applying the function is always the fastest and simplest way to map a function over Numpy arrays:

    import numpy as np
    x = np.array([1, 2, 3, 4, 5])
    f = lambda x: x ** 2
    squares = f(x)
    

    Generally avoid np.vectorize, as it does not perform well, and has (or had) a number of issues. If you are handling other data types, you may want to investigate the other methods shown below.

    Comparison of methods

    Here are some simple tests to compare three methods to map a function, this example using with Python 3.6 and NumPy 1.15.4. First, the set-up functions for testing:

    import timeit
    import numpy as np
    
    f = lambda x: x ** 2
    vf = np.vectorize(f)
    
    def test_array(x, n):
        t = timeit.timeit(
            'np.array([f(xi) for xi in x])',
            'from __main__ import np, x, f', number=n)
        print('array: {0:.3f}'.format(t))
    
    def test_fromiter(x, n):
        t = timeit.timeit(
            'np.fromiter((f(xi) for xi in x), x.dtype, count=len(x))',
            'from __main__ import np, x, f', number=n)
        print('fromiter: {0:.3f}'.format(t))
    
    def test_direct(x, n):
        t = timeit.timeit(
            'f(x)',
            'from __main__ import x, f', number=n)
        print('direct: {0:.3f}'.format(t))
    
    def test_vectorized(x, n):
        t = timeit.timeit(
            'vf(x)',
            'from __main__ import x, vf', number=n)
        print('vectorized: {0:.3f}'.format(t))
    

    Testing with five elements (sorted from fastest to slowest):

    x = np.array([1, 2, 3, 4, 5])
    n = 100000
    test_direct(x, n)      # 0.265
    test_fromiter(x, n)    # 0.479
    test_array(x, n)       # 0.865
    test_vectorized(x, n)  # 2.906
    

    With 100s of elements:

    x = np.arange(100)
    n = 10000
    test_direct(x, n)      # 0.030
    test_array(x, n)       # 0.501
    test_vectorized(x, n)  # 0.670
    test_fromiter(x, n)    # 0.883
    

    And with 1000s of array elements or more:

    x = np.arange(1000)
    n = 1000
    test_direct(x, n)      # 0.007
    test_fromiter(x, n)    # 0.479
    test_array(x, n)       # 0.516
    test_vectorized(x, n)  # 0.945
    

    Different versions of Python/NumPy and compiler optimization will have different results, so do a similar test for your environment.

提交回复
热议问题