How to handle C++ return type std::vector in Python ctypes?

前端 未结 3 1874
半阙折子戏
半阙折子戏 2020-12-03 03:54

I cannot find how ctypes will bridge the gap between std::vector and Python; no where on the internet is the combination mentioned. Is this bad practice, does i

相关标签:
3条回答
  • 2020-12-03 04:15

    Basically, returning a C++ object from a dynamically loaded library is not a good idea. To use the C++ vector in Python code, you must teach Python to deal with C++ objects (and this includes binary representation of the objects which can change with new version of a C++ compiler or STL).

    ctypes allows you to interact with a library using C types. Not C++.

    Maybe the problem is solvable via boost::python, but it looks more reliable to use plain C for the interaction.

    0 讨论(0)
  • 2020-12-03 04:23

    Whether or not this approach actually provides faster execution time, I'll explain a bit about how you could go about doing it. Basically, create a pointer to a C++ vector which can interface with Python through C functions. You can then wrap the C++ code in a Python class, hiding the implementation details of ctypes.

    I included what I thought would be helpful magic methods to include in the Python class. You can choose to remove them or add more to suit your needs. The destructor is important to keep though.

    C++

    // vector_python.cpp
    #include <vector>
    #include <iostream>
    #include <fstream>
    #include <string>
    
    using namespace std;
    
    extern "C" void foo(vector<int>* v, const char* FILE_NAME){
        string line;
        ifstream myfile(FILE_NAME);
        while (getline(myfile, line)) v->push_back(1);
    }
    
    extern "C" {
        vector<int>* new_vector(){
            return new vector<int>;
        }
        void delete_vector(vector<int>* v){
            cout << "destructor called in C++ for " << v << endl;
            delete v;
        }
        int vector_size(vector<int>* v){
            return v->size();
        }
        int vector_get(vector<int>* v, int i){
            return v->at(i);
        }
        void vector_push_back(vector<int>* v, int i){
            v->push_back(i);
        }
    }
    

    Compile it as a shared library. On Mac OS X this might look like,

    g++ -c -fPIC vector_python.cpp -o vector_python.o
    g++ -shared -Wl,-install_name,vector_python_lib.so -o vector_python_lib.so vector_python.o
    

    Python

    from ctypes import *
    
    class Vector(object):
        lib = cdll.LoadLibrary('vector_python_lib.so') # class level loading lib
        lib.new_vector.restype = c_void_p
        lib.new_vector.argtypes = []
        lib.delete_vector.restype = None
        lib.delete_vector.argtypes = [c_void_p]
        lib.vector_size.restype = c_int
        lib.vector_size.argtypes = [c_void_p]
        lib.vector_get.restype = c_int
        lib.vector_get.argtypes = [c_void_p, c_int]
        lib.vector_push_back.restype = None
        lib.vector_push_back.argtypes = [c_void_p, c_int]
        lib.foo.restype = None
        lib.foo.argtypes = [c_void_p]
    
        def __init__(self):
            self.vector = Vector.lib.new_vector()  # pointer to new vector
    
        def __del__(self):  # when reference count hits 0 in Python,
            Vector.lib.delete_vector(self.vector)  # call C++ vector destructor
    
        def __len__(self):
            return Vector.lib.vector_size(self.vector)
    
        def __getitem__(self, i):  # access elements in vector at index
            if 0 <= i < len(self):
                return Vector.lib.vector_get(self.vector, c_int(i))
            raise IndexError('Vector index out of range')
    
        def __repr__(self):
            return '[{}]'.format(', '.join(str(self[i]) for i in range(len(self))))
    
        def push(self, i):  # push calls vector's push_back
            Vector.lib.vector_push_back(self.vector, c_int(i))
    
        def foo(self, filename):  # foo in Python calls foo in C++
            Vector.lib.foo(self.vector, c_char_p(filename))
    

    You can then test it out in the interpreter (file.txt just consists of three lines of jibberish).

    >>> from vector import Vector
    >>> a = Vector()
    >>> a.push(22)
    >>> a.push(88)
    >>> a
    [22, 88]
    >>> a[1]
    88
    >>> a[2]
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "vector.py", line 30, in __getitem__
        raise IndexError('Vector index out of range')
    IndexError: Vector index out of range
    >>> a.foo('file.txt')
    >>> a
    [22, 88, 1, 1, 1]
    >>> b = Vector()
    >>> ^D
    destructor called in C++ for 0x1003884d0
    destructor called in C++ for 0x10039df10
    
    0 讨论(0)
  • 2020-12-03 04:32

    The particular reason is that speed is important. I'm creating an application that should be able to handle big data. On 200,000 rows the missings have to be counted on 300 values (200k by 300 matrix). I believe, but correct me if I'm wrong, that C++ will be significantly faster.

    Well, if you're reading from a large file, your process is going to be mostly IO-bound, so the timings between Python and C probably won't be significantly different.

    The following code...

    result = []
    for line in open('test.txt'):
        result.append(line.count('NA'))
    

    ...seems to run just as fast as anything I can hack together in C, although it's using some optimized algorithm I'm not really familiar with.

    It takes less than a second to process 200,000 lines, although I'd be interested to see if you can write a C function which is significantly faster.


    Update

    If you want to do it in C, and end up with a Python list, it's probably more efficient to use the Python/C API to build the list yourself, rather than building a std::vector then converting to a Python list later on.

    An example which just returns a list of integers from 0 to 99...

    // hack.c
    
    #include <python2.7/Python.h>
    
    PyObject* foo(const char* filename)
    {
        PyObject* result = PyList_New(0);
        int i;
    
        for (i = 0; i < 100; ++i)
        {
            PyList_Append(result, PyInt_FromLong(i));
        }
    
        return result;
    }
    

    Compiled with...

    $ gcc -c hack.c -fPIC
    $ ld -o hack.so -shared hack.o -lpython2.7
    

    Example of usage...

    >>> from ctypes import *
    >>> dll = CDLL('./hack.so')
    >>> dll.foo.restype = py_object
    >>> dll.foo('foo')
    [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, ...]
    
    0 讨论(0)
提交回复
热议问题