Why is numpy.array so slow?

前端 未结 4 1317
南笙
南笙 2020-11-27 21:54

I am baffled by this

def main():
    for i in xrange(2560000):
        a = [0.0, 0.0, 0.0]

main()

$ time python test.py

real     0m0.793s
<
相关标签:
4条回答
  • 2020-11-27 22:44

    Of course numpy consumes more time in this case, since: a = np.array([0.0, 0.0, 0.0]) <=~=> a = [0.0, 0.0, 0.0]; a = np.array(a), it took two steps. But numpy-array has many good qualities, its high speed can be seen in the operations on them, not the creation of them. Part of my personal thoughts:).

    0 讨论(0)
  • 2020-11-27 22:47

    Numpy is optimised for large amounts of data. Give it a tiny 3 length array and, unsurprisingly, it performs poorly.

    Consider a separate test

    import timeit
    
    reps = 100
    
    pythonTest = timeit.Timer('a = [0.] * 1000000')
    numpyTest = timeit.Timer('a = numpy.zeros(1000000)', setup='import numpy')
    uninitialised = timeit.Timer('a = numpy.empty(1000000)', setup='import numpy')
    # empty simply allocates the memory. Thus the initial contents of the array 
    # is random noise
    
    print 'python list:', pythonTest.timeit(reps), 'seconds'
    print 'numpy array:', numpyTest.timeit(reps), 'seconds'
    print 'uninitialised array:', uninitialised.timeit(reps), 'seconds'
    

    And the output is

    python list: 1.22042918205 seconds
    numpy array: 1.05412316322 seconds
    uninitialised array: 0.0016028881073 seconds
    

    It would seem that it is the zeroing of the array that is taking all the time for numpy. So unless you need the array to be initialised then try using empty.

    0 讨论(0)
  • 2020-11-27 22:52

    Late answer, but could be important for other viewers.

    This problem has been considered in the kwant project as well. Indeed small arrays are not optimized in numpy and quite frequently small arrays are exactly what you need.

    In this regard they created a substitute for small arrays which behaves and co-exists with the numpy arrays (any non-implemented operation in the new data-type is processed by numpy).

    You should look into this project:
    https://pypi.python.org/pypi/tinyarray/1.0.5
    which main purpose is to behave nicely for small arrays. Of course some of the more fancy things you can do with numpy is not supported by this. But numerics seems to be your request.

    I have made some small tests:

    python

    I have added numpy import to get the load time correct

    import numpy
    
    def main():
        for i in xrange(2560000):
            a = [0.0, 0.0, 0.0]
    
    main()
    

    numpy

    import numpy
    
    def main():
        for i in xrange(2560000):
            a = numpy.array([0.0, 0.0, 0.0])
    
    main()
    

    numpy-zero

    import numpy
    
    def main():
        for i in xrange(2560000):
            a = numpy.zeros((3,1))
    
    main()
    

    tinyarray

    import numpy,tinyarray
    
    def main():
        for i in xrange(2560000):
            a = tinyarray.array([0.0, 0.0, 0.0])
    
    main()
    

    tinyarray-zero

    import numpy,tinyarray
    
    def main():
        for i in xrange(2560000):
            a = tinyarray.zeros((3,1))
    
    main()
    

    I ran this:

    for f in python numpy numpy_zero tiny tiny_zero ; do 
       echo $f 
       for i in `seq 5` ; do 
          time python ${f}_test.py
       done 
     done
    

    And got:

    python
    python ${f}_test.py  0.31s user 0.02s system 99% cpu 0.339 total
    python ${f}_test.py  0.29s user 0.03s system 98% cpu 0.328 total
    python ${f}_test.py  0.33s user 0.01s system 98% cpu 0.345 total
    python ${f}_test.py  0.31s user 0.01s system 98% cpu 0.325 total
    python ${f}_test.py  0.32s user 0.00s system 98% cpu 0.326 total
    numpy
    python ${f}_test.py  2.79s user 0.01s system 99% cpu 2.812 total
    python ${f}_test.py  2.80s user 0.02s system 99% cpu 2.832 total
    python ${f}_test.py  3.01s user 0.02s system 99% cpu 3.033 total
    python ${f}_test.py  2.99s user 0.01s system 99% cpu 3.012 total
    python ${f}_test.py  3.20s user 0.01s system 99% cpu 3.221 total
    numpy_zero
    python ${f}_test.py  1.04s user 0.02s system 99% cpu 1.075 total
    python ${f}_test.py  1.08s user 0.02s system 99% cpu 1.106 total
    python ${f}_test.py  1.04s user 0.02s system 99% cpu 1.065 total
    python ${f}_test.py  1.03s user 0.02s system 99% cpu 1.059 total
    python ${f}_test.py  1.05s user 0.01s system 99% cpu 1.064 total
    tiny
    python ${f}_test.py  0.93s user 0.02s system 99% cpu 0.955 total
    python ${f}_test.py  0.98s user 0.01s system 99% cpu 0.993 total
    python ${f}_test.py  0.93s user 0.02s system 99% cpu 0.953 total
    python ${f}_test.py  0.92s user 0.02s system 99% cpu 0.944 total
    python ${f}_test.py  0.96s user 0.01s system 99% cpu 0.978 total
    tiny_zero
    python ${f}_test.py  0.71s user 0.03s system 99% cpu 0.739 total
    python ${f}_test.py  0.68s user 0.02s system 99% cpu 0.711 total
    python ${f}_test.py  0.70s user 0.01s system 99% cpu 0.721 total
    python ${f}_test.py  0.70s user 0.02s system 99% cpu 0.721 total
    python ${f}_test.py  0.67s user 0.01s system 99% cpu 0.687 total
    

    Now these tests are (as already pointed out) not the best tests. However, they still show that tinyarray is better suited for small arrays.
    Another fact is that the most common operations should be faster with tinyarray. So it might have better benefits of usage than just data creations.

    I have never tried it in a fully fledged project, but the kwant project is using it

    0 讨论(0)
  • 2020-11-27 22:54

    Holy CPU cycles batman!, indeed.

    But please rather consider something very fundamental related to numpy; sophisticated linear algebra based functionality (like random numbers or singular value decomposition). Now, consider these seamingly simple calculations:

    In []: A= rand(2560000, 3)
    In []: %timeit rand(2560000, 3)
    1 loops, best of 3: 296 ms per loop
    In []: %timeit u, s, v= svd(A, full_matrices= False)
    1 loops, best of 3: 571 ms per loop
    

    and please trust me that this kind of performance will not be beaten significantly by any package currently available.

    So, please describe your real problem, and I'll try to figure out decent numpy based solution for it.

    Update:
    Here is some simply code for ray sphere intersection:

    import numpy as np
    
    def mag(X):
        # magnitude
        return (X** 2).sum(0)** .5
    
    def closest(R, c):
        # closest point on ray to center and its distance
        P= np.dot(c.T, R)* R
        return P, mag(P- c)
    
    def intersect(R, P, h, r):
        # intersection of rays and sphere
        return P- (h* (2* r- h))** .5* R
    
    # set up
    c, r= np.array([10, 10, 10])[:, None], 2. # center, radius
    n= 5e5
    R= np.random.rand(3, n) # some random rays in first octant
    R= R/ mag(R) # normalized to unit length
    
    # find rays which will intersect sphere
    P, b= closest(R, c)
    wi= b<= r
    
    # and for those which will, find the intersection
    X= intersect(R[:, wi], P[:, wi], r- b[wi], r)
    

    Apparently we calculated correctly:

    In []: allclose(mag(X- c), r)
    Out[]: True
    

    And some timings:

    In []: % timeit P, b= closest(R, c)
    10 loops, best of 3: 93.4 ms per loop
    In []: n/ 0.0934
    Out[]: 5353319 #=> more than 5 million detection's of possible intersections/ s
    In []: %timeit X= intersect(R[:, wi], P[:, wi], r- b[wi])
    10 loops, best of 3: 32.7 ms per loop
    In []: X.shape[1]/ 0.0327
    Out[]: 874037 #=> almost 1 million actual intersections/ s
    

    These timings are done with very modest machine. With modern machine, a significant speed-up can be still expected.

    Anyway, this is only a short demonstration how to code with numpy.

    0 讨论(0)
提交回复
热议问题