I\'m writing a function that I\'d like to translate to C as much as possible using Cython. To do this, I\'ll need to use linear algebra operations. Here is my function.
A few of small points with respect to your optimized Cython/BLAS functions:
ipiv_a = np.zeros(n).astype(np.int32)
cdef int[::1] ipiv = ipiv_a
can have two easy improvements: it doesn't has to go through a temporary variable, and you can directly create an array of type np.int32
rather than create an array of a different type then casting:
cdef int[::1] ipiv = np.zeros(n,dtype=np.int32)
Simiarly, in both fucntions you can initialize B
in a fewer steps by doing
cdef double[:, ::1] B = A.copy()
rather than creating a zero array and copying
The second (more significant) change would be to use C arrays for temporary variables such as the Fortran workspaces. I'd still keep things like return values as numpy arrays since the reference counting and ability to send them back to Python is very useful.
cdef double* work = <double*>malloc(n*n*sizeof(double))
try:
# rest of function
finally:
free(work)
You need to cimport malloc
and free
from libc.stdlib
. The try: ... finally:
ensures the memory is correctly deallocated. Don't go too over the top with this - for example, if it isn't obvious where to deallocate a C array then just use numpy.
The final option to look at is to not have a return value but to modify an input:
cdef void inv_c(double[:,::1] A, double[:,::1] B):
# check that A and B are the right size, then just write into B
# ...
The advantage of this is that if you need to call this in a loop with identical sized inputs then you only need to do one allocation for the whole loop. You could extend this to include the working arrays too, although that might be a little more complicated.