Here Matrix multiplication using hdf5 I use hdf5 (pytables) for big matrix multiplication, but I was suprised because using hdf5 it works even faster then using plain numpy.
np.dot
dispatches to BLAS when
float32
, float64
, complex32
or complex64
, andOtherwise, it defaults to using its own, slow, matrix multiplication routine.
Checking your BLAS linkage is described here. In short, check whether there's a file _dotblas.so
or similar in your NumPy installation. When there is, check which BLAS library it's linked against; the reference BLAS is slow, ATLAS is fast, OpenBLAS and vendor-specific versions such as Intel MKL are even faster. Watch out with multithreaded BLAS implementations as they don't play nicely with Python's multiprocessing
.
Next, check your data alignment by inspecting the flags
of your arrays. In versions of NumPy before 1.7.2, both arguments to np.dot
should be C-ordered. In NumPy >= 1.7.2, this doesn't matter as much anymore as special cases for Fortran arrays have been introduced.
>>> X = np.random.randn(10, 4)
>>> Y = np.random.randn(7, 4).T
>>> X.flags
C_CONTIGUOUS : True
F_CONTIGUOUS : False
OWNDATA : True
WRITEABLE : True
ALIGNED : True
UPDATEIFCOPY : False
>>> Y.flags
C_CONTIGUOUS : False
F_CONTIGUOUS : True
OWNDATA : False
WRITEABLE : True
ALIGNED : True
UPDATEIFCOPY : False
If your NumPy is not linked against BLAS, either (easy) re-install it, or (hard) use the BLAS gemm
(generalized matrix multiply) function from SciPy:
>>> from scipy.linalg import get_blas_funcs
>>> gemm = get_blas_funcs("gemm", [X, Y])
>>> np.all(gemm(1, X, Y) == np.dot(X, Y))
True
This looks easy, but it does hardly any error checking, so you must really know what you're doing.