问题
I'm trying to write Cython code to dump a dense feature matrix, target vector pair to libsvm format faster than sklearn's built in code. I get a compilation error complaining about a type issue with passing the target vector (a numpy array of ints) to the relevant c function.
Here's the code:
import numpy as np
cimport numpy as np
cimport cython
cdef extern from "cdump.h":
int filedump( double features[], int numexemplars, int numfeats, int target[], char* outfname)
@cython.boundscheck(False)
@cython.wraparound(False)
def fastdumpdense_libsvmformat(np.ndarray[np.double_t,ndim=2] X, y, outfname):
if X.shape[0] != len(y):
raise ValueError("X and y need to have the same number of points")
cdef int numexemplars = X.shape[0]
cdef int numfeats = X.shape[1]
cdef bytes py_bytes = outfname.encode()
cdef char* outfnamestr = py_bytes
cdef np.ndarray[np.double_t, ndim=2, mode="c"] X_c
cdef np.ndarray[np.int_t, ndim=1, mode="c"] y_c
X_c = np.ascontiguousarray(X, dtype=np.double)
y_c = np.ascontiguousarray(y, dtype=np.int)
retval = filedump( &X_c[0,0], numexemplars, numfeats, &y_c[0], outfnamestr)
return retval
When I attempt to compile this code using distutils, I get the error
cythoning fastdump_svm.pyx to fastdump_svm.cpp
Error compiling Cython file:
------------------------------------------------------------ ...
cdef np.ndarray[np.double_t, ndim=2, mode="c"] X_c
cdef np.ndarray[np.int_t, ndim=1, mode="c"] y_c
X_c = np.ascontiguousarray(X, dtype=np.double)
y_c = np.ascontiguousarray(y, dtype=np.int)
retval = filedump( &X_c[0,0], numexemplars, numfeats, &y_c[0], outfnamestr)
^
------------------------------------------------------------
fastdump_svm.pyx:24:58: Cannot assign type 'int_t *' to 'int *'
Any idea how to fix this error? I originally was following the paradigm of passing y_c.data, which works, but this is apparently not the recommended way.
回答1:
The problem is that numpy.int_t
is not the same as int
, you can easily check this by having your program print sizeof(numpy.int_t)
and sizeof(int)
.
int
is a c int, defined by the c standard as being at least 16 bits, but it's 32 bits on my machine. numpy.int_t
is usually 32 bits or 64 bits depending on whether you're using a 32 or 64 bit version of numpy, but of course there is some exception (probably for windows users). If you want to know which numpy dtype matches your c_int you can do np.dtype(cytpes.c_int)
.
So to pass your numpy array to c code you can do:
import ctypes
cdef np.ndarray[int, ndim=1, mode="c"] y_c
y_c = np.ascontiguousarray(y, dtype=ctypes.c_int)
retval = filedump( &X_c[0,0], numexemplars, numfeats, &y_c[0], outfnamestr)
回答2:
You can also use dtype=np.dtype("i")
when initiating a numpy array to match the C int
on your machine.
cdef int [:] y_c
c_array = np.ascontiguousarray(y, dtype=np.dtype("i"))
来源:https://stackoverflow.com/questions/23435756/passing-numpy-integer-array-to-c-code