Cython: Create memoryview without NumPy array?

Since I found memory-views handy and fast, I try to avoid creating NumPy arrays in cython and work with the views of the given arrays. However, sometimes it cannot be avoided, not to alter an existing array but create a new one. In upper functions this is not noticeable, but in often called subroutines it is. Consider the following function

#@cython.profile(False)
@cython.boundscheck(False)
@cython.wraparound(False)
@cython.nonecheck(False)
cdef double [:] vec_eq(double [:] v1, int [:] v2, int cond):
    ''' Function output corresponds to v1[v2 == cond]'''
    cdef unsigned int n = v1.shape[0]
    cdef unsigned int n_ = 0
    # Size of array to create
    cdef size_t i
    for i in range(n):
        if v2[i] == cond:
            n_ += 1
    # Create array for selection
    cdef double [:] s = np.empty(n_, dtype=np_float) # Slow line
    # Copy selection to new array
    n_ = 0
    for i in range(n):
        if v2[i] == cond:
            s[n_] = v1[i]
            n_ += 1
    return s

Profiling tells me, there is some speed to gain here

What I could do is adapting the function, cause sometimes, for instance the mean of this vector is calculated, sometimes the sum. So I could rewrite it, for summing or taking average. But isn't there a way to create memory-view with very little overhead directly, defining size dynamically. Something like first creating a c buffer using malloc etc and at the end of the function convert the buffer to a view, passing the pointer and strides or so..

Edit 1: Maybe for simple cases, adapting the function e. g. like this is an acceptable approach. I only added an argument and summing/taking average. This way I dont have to create an array and can take an easy to handle inside function malloc. This won't get any faster, will it?

# ...
cdef double vec_eq(double [:] v1, int [:] v2, int cond, opt=0):
    # additional option argument
    ''' Function output corresponds to v1[v2 == cond].sum() / .mean()'''
    cdef unsigned int n = v1.shape[0]
    cdef int n_ = 0
    # Size of array to create
    cdef Py_ssize_t i
    for i in prange(n, nogil=True):
        if v2[i] == cond:
            n_ += 1
    # Create array for selection
    cdef double s = 0
    cdef double * v3 = <double *> malloc(sizeof(double) * n_)
    if v3 == NULL:
        abort()
    # Copy selection to new array
    n_ = 0
    for i in range(n):
        if v2[i] == cond:
            v3[n_] = v1[i]
            n_ += 1
    # Do further computation here, according to option
    # Option 0 for the sum
    if opt == 0:
        for i in prange(n_, nogil=True):
            s += v3[i]
        free(v3)
        return s
    # Option 1 for the mean
    else:
        for i in prange(n_, nogil=True):
            s += v3[i]
        free(v3)
        return s / n_
    # Since in the end there is always only a single double value, 
    # the memory can be freed right here

embert

Didn't know, how to deal with cpython arrays, so I solved this finally by a self made 'memory view', as proposed by fabrizioM. Wouldn't have thought that this would work. Creating a new np.array in a tight loop is pretty expensive, so this gave me a significant speed up. Since I only need a 1 dimensional array, I didn't even had to bother with strides. But even for a higher dimensional arrays, I think this could go well.

cdef class Vector:
    cdef double *data
    cdef public int n_ax0

    def __init__(Vector self, int n_ax0):
        self.data = <double*> malloc (sizeof(double) * n_ax0)
        self.n_ax0 = n_ax0

    def __dealloc__(Vector self):
        free(self.data)

...
#@cython.profile(False)
@cython.boundscheck(False)
cdef Vector my_vec_func(double [:, ::1] a, int [:] v, int cond, int opt):
    # function returning a Vector, which can be hopefully freed by del Vector
    cdef int vecsize
    cdef size_t i
    # defs..
    # more stuff...
    vecsize = n
    cdef Vector v = Vector(vecsize)

    for i in range(vecsize):
        # computation
        v[i] = ...

    return v

...
vec = my_vec_func(...
ptr_to_data = vec.data
length_of_vec = vec.n_ax0

The following thread on the Cython mailing list would probably be of interest to you:

https://groups.google.com/forum/#!topic/cython-users/CwtU_jYADgM

It looks like there are some decent options presented if you are fine with returning a memoryview from your function that gets coerced at some different level where perfomance isn't as much of an issue.

From http://docs.cython.org/src/userguide/memoryviews.html it follows that memory for cython memory views can be allocated via:

cimport cython
cdef type [:] cview = cython.view.array(size = size, 
              itemsize = sizeof(type), format = "type", allocate_buffer = True)

or by

from libc.stdlib import malloc, free
cdef type [:] cview = <type[:size]> malloc(sizeof(type)*size)

Both case works, but in first i have an issues if introduce own type (ctypedef some mytype) because there is no suitable format for it. In second case there is problem with deallocation of memory.

From manual it should work as follows: