I\'ve got a numpy
script that spends about 50% of its runtime in the following code:
s = numpy.dot(v1, v1)
where
v1 = v[1
Perhaps the culprit is copying of the arrays passed to dot.
As Sven said, the dot product relies on BLAS operations. These operations require arrays stored in contiguous C order. If both arrays passed to dot are in C_CONTIGUOUS, you ought to see better performance.
Of course, if your two arrays passed to dot are indeed 1D (8,) then you should see both the C_CONTIGUOUS AND F_CONTIGUOUS flags set to True; but if they are (1, 8), then you can see mixed order.
>>> w = NP.random.randint(0, 10, 100).reshape(100, 1)
>>> w.flags
C_CONTIGUOUS : True
F_CONTIGUOUS : False
OWNDATA : False
WRITEABLE : True
ALIGNED : True
UPDATEIFCOPY : False
An alternative: use _GEMM from BLAS, which is exposed through the module, scipy.linalg.fblas. (The two arrays, A and B, are obviously in Fortran order because fblas is used.)
from scipy.linalg import fblas as FB
X = FB.dgemm(alpha=1., a=A, b=B, trans_b=True)