This has to deal with a similar problem here: Calling BLAS / LAPACK directly using the SciPy interface and Cython but is different because I\'m using the actual code in the
If I see it right, you try to use fortran-routines for arrays with c-memory-layout.
Even if it is obviously known to you, I would like first to elaborate on the row-major-order (c-memory-layout) and the column-major-order (fortran-memory-layout), in order to deduce my answer.
So if we have a 2x3
matrix (i.e. 2 rows and 3 columns) A
, and store it in some continuous memory we get:
row-major-order(A) = A11, A12, A13, A21, A22, A23
col-major-order(A) = A11, A21, A12, A22, A13, A33
That means if we get a continuous memory, which represents a matrix in the row-major-order, and interpret it as a matrix in column-major-order we will get quite a different matrix!
However, we we take a look at the transposed matrix A^t
we can easily see:
row-major-order(A) = col-major-order(A^t)
col-major-order(A) = row-major-order(A^t)
That means, if we would like to get the matrix C
in row-major-order as result, the blas-routine should write the transposed matrix C
in column-major-order (after all this we cannot change) into this very memory. However, C^t=(AB)^t=B^t*A^t
and B^t
an A^t
are the original matrices reinterpreted in column-major-order.
Now, let A
be a n x k
-matrix and B
a k x m
-matrix, the call of dgemm routine should be as follows:
dgemm(transa, transb, &m, &n, &k, &alpha, b0, &m, a0, &k, &beta, c0, &m)
As you can see, you switched some n
and m
in your code.