I am doing some performance test on a variant of the prime numbers generator from http://docs.cython.org/src/tutorial/numpy.html. The below performance measures are with kma
cdef DTYPE_t [:] p_view = p
Using this instead of p in the calculations. reduced the runtime from 580 ms down to 2.8 ms for me. About the exact same runtime as the implementation using *int. And that's about the max you can expect from this.
DTYPE = np.int
ctypedef np.int_t DTYPE_t
@cython.boundscheck(False)
def primes(DTYPE_t kmax):
cdef DTYPE_t n, k, i
cdef np.ndarray p = np.empty(kmax, dtype=DTYPE)
cdef DTYPE_t [:] p_view = p
k = 0
n = 2
while k < kmax:
i = 0
while i < k and n % p_view[i] != 0:
i = i + 1
if i == k:
p_view[k] = n
k = k + 1
n = n + 1
return p
why is the numpy array so incredibly slower than a python list, when running on CPython?
Because you didn't fully type it. Use
cdef np.ndarray[dtype=np.int, ndim=1] p = np.empty(kmax, dtype=DTYPE)
how do I cast a numpy array to a int*?
By using np.intc
as the dtype, not np.int
(which is a C long
). That's
cdef np.ndarray[dtype=int, ndim=1] p = np.empty(kmax, dtype=np.intc)
(But really, use a memoryview, they're much cleaner and the Cython folks want to get rid of the NumPy array syntax in the long run.)
Best syntax I found so far:
import numpy
cimport numpy
cimport cython
@cython.boundscheck(False)
@cython.wraparound(False)
def primes(int kmax):
cdef int n, k, i
cdef numpy.ndarray[int] p = numpy.empty(kmax, dtype=numpy.int32)
k = 0
n = 2
while k < kmax:
i = 0
while i < k and n % p[i] != 0:
i = i + 1
if i == k:
p[k] = n
k = k + 1
n = n + 1
return p
Note where I used numpy.int32 instead of int. Anything on the left side of a cdef is a C type (thus int = int32 and float = float32), while anything on the RIGHT side of it (or outside of a cdef) is a python type (int = int64 and float = float64)