Passing list of numpy arrays to C using cython

前端未结

关注

 2  712

I have a list list_of_arrays of 3D numpy arrays that I want to pass to a C function with the template

int my_func_c(double **data, int **shape,


                      
              相关标签:


      
      
        
          2条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  迷失自我        
                
              
                            
                2021-01-13 03:55
              
            
            
                                                                       
One alternative would be to let numpy manage your memory for you. You can do this by using numpy arrays of np.uintp which is an unsigned int with the same size as any pointer.

Unfortunately, this does require some type-casting (between "pointer sized int" and pointers) which is a good way of hiding logic errors, so I'm not 100% happy with it.

def my_func(list list_of_arrays):
    cdef int n_arrays  = len(list_of_arrays)
    cdef np.uintp_t[::1] data = np.array((n_arrays,),dtype=np.uintp)
    cdef np.uintp_t[::1] shape = np.array((n_arrays,),dtype=np.uintp)
    cdef double x;

    cdef np.ndarray[double, ndim=3, mode="c"] temp

    for i in range(n_arrays):
        temp = list_of_arrays[i]
        data[i]  = <np.uintp_t>&temp[0,0,0]
        shape[i] = <np.uintp_t>&(temp.shape[0])

    x = my_func_c(<double**>(&data[0]), <np.intp_t**>&shape[0], n_arrays)


(I should point out that I've only confirmed that it compiles and not tested it further, but the basic idea should be OK)



The way you've done it is probably a pretty sensible way. One slight simplification to your original code that should work

shape[i] = <np.uintp_t>&(temp.shape[0])


instead of malloc and copy. I'd also recommend putting the frees in a finally block to ensure they get run.



Edit: @ead has helpfully pointed out that the numpy shape is stored as as np.intp_t - i.e. an signed integer big enough to fit a pointer in, which is mostly 64bit - while int is usually 32 bit. Therefore, to pass the shape without copying you'd need to change your C api. Casting help makes that mistake harder to spot ("a good way of hiding logic errors")
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  栀梦        
                
              
                            
                2021-01-13 04:05
              
            
            
                                                                       
I think this is a good pattern to consume C-functionality from C++-code, and it can be also used here and would have two advantages:


Memory management is taken care of.
Thanks to templates, no casting needed, so we still have the safety-net of c's type safety.


To solve your problems you could use std::vector:

import numpy as np
cimport numpy as np
from libcpp.vector cimport vector

cdef extern from "my_func.c":
    double my_func_c(double **data, int **shape, int n_arrays)

def my_func(list list_of_arrays):
    cdef int n_arrays  = len(list_of_arrays)
    cdef vector[double *] data
    cdef vector [vector[int]] shape_mem # for storing casted shapes
    cdef vector[int *] shape  #pointers to stored shapes
    cdef double x
    cdef np.ndarray[double, ndim=3, mode="c"] temp

    shape_mem.resize(n_arrays)  
    for i in range(n_arrays):
        print "i:", i
        temp = list_of_arrays[i]
        data.push_back(&temp[0,0,0])
        for j in range(3):
            shape_mem[i].push_back(temp.shape[j])
        shape.push_back(shape_mem[i].data())

    x = my_func_c(data.data(), shape.data(), n_arrays)

    return x


Also your setup would need a modification:

# setup.py    
from distutils.core import setup, Extension
from Cython.Build import cythonize
import numpy as np

setup(ext_modules=cythonize(Extension(
            name='my_func_c',
            language='c++',
            extra_compile_args=['-std=c++11'],
            sources = ["my_func_c.pyx", "my_func.c"],
            include_dirs=[np.get_include()]
    )))


I prefer to use std::vector.data() over &data[0] because the second would mean undefined behavior for empty data, and that is the reason we need std=c++11 flag.

But in the end, it is for you to decide, which trade-off to make: the additional complexity of C++ (it has it own pitfalls) vs. handmade memory management vs. letting go of type safety for a short moment.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复