How to get faster code than numpy.dot for matrix multiplication?

前端 未结 1 590
名媛妹妹
名媛妹妹 2020-11-30 04:25

Here Matrix multiplication using hdf5 I use hdf5 (pytables) for big matrix multiplication, but I was suprised because using hdf5 it works even faster then using plain numpy.

相关标签:
1条回答
  • 2020-11-30 05:00

    np.dot dispatches to BLAS when

    • NumPy has been compiled to use BLAS,
    • a BLAS implementation is available at run-time,
    • your data has one of the dtypes float32, float64, complex32 or complex64, and
    • the data is suitably aligned in memory.

    Otherwise, it defaults to using its own, slow, matrix multiplication routine.

    Checking your BLAS linkage is described here. In short, check whether there's a file _dotblas.so or similar in your NumPy installation. When there is, check which BLAS library it's linked against; the reference BLAS is slow, ATLAS is fast, OpenBLAS and vendor-specific versions such as Intel MKL are even faster. Watch out with multithreaded BLAS implementations as they don't play nicely with Python's multiprocessing.

    Next, check your data alignment by inspecting the flags of your arrays. In versions of NumPy before 1.7.2, both arguments to np.dot should be C-ordered. In NumPy >= 1.7.2, this doesn't matter as much anymore as special cases for Fortran arrays have been introduced.

    >>> X = np.random.randn(10, 4)
    >>> Y = np.random.randn(7, 4).T
    >>> X.flags
      C_CONTIGUOUS : True
      F_CONTIGUOUS : False
      OWNDATA : True
      WRITEABLE : True
      ALIGNED : True
      UPDATEIFCOPY : False
    >>> Y.flags
      C_CONTIGUOUS : False
      F_CONTIGUOUS : True
      OWNDATA : False
      WRITEABLE : True
      ALIGNED : True
      UPDATEIFCOPY : False
    

    If your NumPy is not linked against BLAS, either (easy) re-install it, or (hard) use the BLAS gemm (generalized matrix multiply) function from SciPy:

    >>> from scipy.linalg import get_blas_funcs
    >>> gemm = get_blas_funcs("gemm", [X, Y])
    >>> np.all(gemm(1, X, Y) == np.dot(X, Y))
    True
    

    This looks easy, but it does hardly any error checking, so you must really know what you're doing.

    0 讨论(0)
提交回复
热议问题