I am baffled by this
def main():
for i in xrange(2560000):
a = [0.0, 0.0, 0.0]
main()
$ time python test.py
real 0m0.793s
<
Of course numpy consumes more time in this case, since: a = np.array([0.0, 0.0, 0.0])
<=~=> a = [0.0, 0.0, 0.0]; a = np.array(a)
, it took two steps. But numpy-array has many good qualities, its high speed can be seen in the operations on them, not the creation of them. Part of my personal thoughts:).
Numpy is optimised for large amounts of data. Give it a tiny 3 length array and, unsurprisingly, it performs poorly.
Consider a separate test
import timeit
reps = 100
pythonTest = timeit.Timer('a = [0.] * 1000000')
numpyTest = timeit.Timer('a = numpy.zeros(1000000)', setup='import numpy')
uninitialised = timeit.Timer('a = numpy.empty(1000000)', setup='import numpy')
# empty simply allocates the memory. Thus the initial contents of the array
# is random noise
print 'python list:', pythonTest.timeit(reps), 'seconds'
print 'numpy array:', numpyTest.timeit(reps), 'seconds'
print 'uninitialised array:', uninitialised.timeit(reps), 'seconds'
And the output is
python list: 1.22042918205 seconds
numpy array: 1.05412316322 seconds
uninitialised array: 0.0016028881073 seconds
It would seem that it is the zeroing of the array that is taking all the time for numpy. So unless you need the array to be initialised then try using empty.
Late answer, but could be important for other viewers.
This problem has been considered in the kwant project as well. Indeed small arrays are not optimized in numpy and quite frequently small arrays are exactly what you need.
In this regard they created a substitute for small arrays which behaves and co-exists with the numpy arrays (any non-implemented operation in the new data-type is processed by numpy).
You should look into this project:
https://pypi.python.org/pypi/tinyarray/1.0.5
which main purpose is to behave nicely for small arrays. Of course some of the more fancy things you can do with numpy is not supported by this. But numerics seems to be your request.
I have made some small tests:
I have added numpy import to get the load time correct
import numpy
def main():
for i in xrange(2560000):
a = [0.0, 0.0, 0.0]
main()
import numpy
def main():
for i in xrange(2560000):
a = numpy.array([0.0, 0.0, 0.0])
main()
import numpy
def main():
for i in xrange(2560000):
a = numpy.zeros((3,1))
main()
import numpy,tinyarray
def main():
for i in xrange(2560000):
a = tinyarray.array([0.0, 0.0, 0.0])
main()
import numpy,tinyarray
def main():
for i in xrange(2560000):
a = tinyarray.zeros((3,1))
main()
I ran this:
for f in python numpy numpy_zero tiny tiny_zero ; do
echo $f
for i in `seq 5` ; do
time python ${f}_test.py
done
done
And got:
python
python ${f}_test.py 0.31s user 0.02s system 99% cpu 0.339 total
python ${f}_test.py 0.29s user 0.03s system 98% cpu 0.328 total
python ${f}_test.py 0.33s user 0.01s system 98% cpu 0.345 total
python ${f}_test.py 0.31s user 0.01s system 98% cpu 0.325 total
python ${f}_test.py 0.32s user 0.00s system 98% cpu 0.326 total
numpy
python ${f}_test.py 2.79s user 0.01s system 99% cpu 2.812 total
python ${f}_test.py 2.80s user 0.02s system 99% cpu 2.832 total
python ${f}_test.py 3.01s user 0.02s system 99% cpu 3.033 total
python ${f}_test.py 2.99s user 0.01s system 99% cpu 3.012 total
python ${f}_test.py 3.20s user 0.01s system 99% cpu 3.221 total
numpy_zero
python ${f}_test.py 1.04s user 0.02s system 99% cpu 1.075 total
python ${f}_test.py 1.08s user 0.02s system 99% cpu 1.106 total
python ${f}_test.py 1.04s user 0.02s system 99% cpu 1.065 total
python ${f}_test.py 1.03s user 0.02s system 99% cpu 1.059 total
python ${f}_test.py 1.05s user 0.01s system 99% cpu 1.064 total
tiny
python ${f}_test.py 0.93s user 0.02s system 99% cpu 0.955 total
python ${f}_test.py 0.98s user 0.01s system 99% cpu 0.993 total
python ${f}_test.py 0.93s user 0.02s system 99% cpu 0.953 total
python ${f}_test.py 0.92s user 0.02s system 99% cpu 0.944 total
python ${f}_test.py 0.96s user 0.01s system 99% cpu 0.978 total
tiny_zero
python ${f}_test.py 0.71s user 0.03s system 99% cpu 0.739 total
python ${f}_test.py 0.68s user 0.02s system 99% cpu 0.711 total
python ${f}_test.py 0.70s user 0.01s system 99% cpu 0.721 total
python ${f}_test.py 0.70s user 0.02s system 99% cpu 0.721 total
python ${f}_test.py 0.67s user 0.01s system 99% cpu 0.687 total
Now these tests are (as already pointed out) not the best tests. However, they still show that tinyarray is better suited for small arrays.
Another fact is that the most common operations should be faster with tinyarray. So it might have better benefits of usage than just data creations.
I have never tried it in a fully fledged project, but the kwant project is using it
Holy CPU cycles batman!
, indeed.
But please rather consider something very fundamental related to numpy
; sophisticated linear algebra based functionality (like random numbers
or singular value decomposition
). Now, consider these seamingly simple calculations:
In []: A= rand(2560000, 3)
In []: %timeit rand(2560000, 3)
1 loops, best of 3: 296 ms per loop
In []: %timeit u, s, v= svd(A, full_matrices= False)
1 loops, best of 3: 571 ms per loop
and please trust me that this kind of performance will not be beaten significantly by any package currently available.
So, please describe your real problem, and I'll try to figure out decent numpy
based solution for it.
Update:
Here is some simply code for ray sphere intersection:
import numpy as np
def mag(X):
# magnitude
return (X** 2).sum(0)** .5
def closest(R, c):
# closest point on ray to center and its distance
P= np.dot(c.T, R)* R
return P, mag(P- c)
def intersect(R, P, h, r):
# intersection of rays and sphere
return P- (h* (2* r- h))** .5* R
# set up
c, r= np.array([10, 10, 10])[:, None], 2. # center, radius
n= 5e5
R= np.random.rand(3, n) # some random rays in first octant
R= R/ mag(R) # normalized to unit length
# find rays which will intersect sphere
P, b= closest(R, c)
wi= b<= r
# and for those which will, find the intersection
X= intersect(R[:, wi], P[:, wi], r- b[wi], r)
Apparently we calculated correctly:
In []: allclose(mag(X- c), r)
Out[]: True
And some timings:
In []: % timeit P, b= closest(R, c)
10 loops, best of 3: 93.4 ms per loop
In []: n/ 0.0934
Out[]: 5353319 #=> more than 5 million detection's of possible intersections/ s
In []: %timeit X= intersect(R[:, wi], P[:, wi], r- b[wi])
10 loops, best of 3: 32.7 ms per loop
In []: X.shape[1]/ 0.0327
Out[]: 874037 #=> almost 1 million actual intersections/ s
These timings are done with very modest machine. With modern machine, a significant speed-up can be still expected.
Anyway, this is only a short demonstration how to code with numpy
.