I cannot find how ctypes will bridge the gap between std::vector
and Python; no where on the internet is the combination mentioned. Is this bad practice, does i
Basically, returning a C++ object from a dynamically loaded library is not a good idea. To use the C++ vector
in Python code, you must teach Python to deal with C++ objects (and this includes binary representation of the objects which can change with new version of a C++ compiler or STL).
ctypes
allows you to interact with a library using C types. Not C++.
Maybe the problem is solvable via boost::python
, but it looks more reliable to use plain C for the interaction.
Whether or not this approach actually provides faster execution time, I'll explain a bit about how you could go about doing it. Basically, create a pointer to a C++ vector
which can interface with Python through C functions. You can then wrap the C++ code in a Python class, hiding the implementation details of ctypes
.
I included what I thought would be helpful magic methods to include in the Python class. You can choose to remove them or add more to suit your needs. The destructor is important to keep though.
// vector_python.cpp
#include <vector>
#include <iostream>
#include <fstream>
#include <string>
using namespace std;
extern "C" void foo(vector<int>* v, const char* FILE_NAME){
string line;
ifstream myfile(FILE_NAME);
while (getline(myfile, line)) v->push_back(1);
}
extern "C" {
vector<int>* new_vector(){
return new vector<int>;
}
void delete_vector(vector<int>* v){
cout << "destructor called in C++ for " << v << endl;
delete v;
}
int vector_size(vector<int>* v){
return v->size();
}
int vector_get(vector<int>* v, int i){
return v->at(i);
}
void vector_push_back(vector<int>* v, int i){
v->push_back(i);
}
}
Compile it as a shared library. On Mac OS X this might look like,
g++ -c -fPIC vector_python.cpp -o vector_python.o
g++ -shared -Wl,-install_name,vector_python_lib.so -o vector_python_lib.so vector_python.o
from ctypes import *
class Vector(object):
lib = cdll.LoadLibrary('vector_python_lib.so') # class level loading lib
lib.new_vector.restype = c_void_p
lib.new_vector.argtypes = []
lib.delete_vector.restype = None
lib.delete_vector.argtypes = [c_void_p]
lib.vector_size.restype = c_int
lib.vector_size.argtypes = [c_void_p]
lib.vector_get.restype = c_int
lib.vector_get.argtypes = [c_void_p, c_int]
lib.vector_push_back.restype = None
lib.vector_push_back.argtypes = [c_void_p, c_int]
lib.foo.restype = None
lib.foo.argtypes = [c_void_p]
def __init__(self):
self.vector = Vector.lib.new_vector() # pointer to new vector
def __del__(self): # when reference count hits 0 in Python,
Vector.lib.delete_vector(self.vector) # call C++ vector destructor
def __len__(self):
return Vector.lib.vector_size(self.vector)
def __getitem__(self, i): # access elements in vector at index
if 0 <= i < len(self):
return Vector.lib.vector_get(self.vector, c_int(i))
raise IndexError('Vector index out of range')
def __repr__(self):
return '[{}]'.format(', '.join(str(self[i]) for i in range(len(self))))
def push(self, i): # push calls vector's push_back
Vector.lib.vector_push_back(self.vector, c_int(i))
def foo(self, filename): # foo in Python calls foo in C++
Vector.lib.foo(self.vector, c_char_p(filename))
You can then test it out in the interpreter (file.txt
just consists of three lines of jibberish).
>>> from vector import Vector
>>> a = Vector()
>>> a.push(22)
>>> a.push(88)
>>> a
[22, 88]
>>> a[1]
88
>>> a[2]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "vector.py", line 30, in __getitem__
raise IndexError('Vector index out of range')
IndexError: Vector index out of range
>>> a.foo('file.txt')
>>> a
[22, 88, 1, 1, 1]
>>> b = Vector()
>>> ^D
destructor called in C++ for 0x1003884d0
destructor called in C++ for 0x10039df10
The particular reason is that speed is important. I'm creating an application that should be able to handle big data. On 200,000 rows the missings have to be counted on 300 values (200k by 300 matrix). I believe, but correct me if I'm wrong, that C++ will be significantly faster.
Well, if you're reading from a large file, your process is going to be mostly IO-bound, so the timings between Python and C probably won't be significantly different.
The following code...
result = []
for line in open('test.txt'):
result.append(line.count('NA'))
...seems to run just as fast as anything I can hack together in C, although it's using some optimized algorithm I'm not really familiar with.
It takes less than a second to process 200,000 lines, although I'd be interested to see if you can write a C function which is significantly faster.
Update
If you want to do it in C, and end up with a Python list, it's probably more efficient to use the Python/C API to build the list yourself, rather than building a std::vector
then converting to a Python list later on.
An example which just returns a list of integers from 0 to 99...
// hack.c
#include <python2.7/Python.h>
PyObject* foo(const char* filename)
{
PyObject* result = PyList_New(0);
int i;
for (i = 0; i < 100; ++i)
{
PyList_Append(result, PyInt_FromLong(i));
}
return result;
}
Compiled with...
$ gcc -c hack.c -fPIC
$ ld -o hack.so -shared hack.o -lpython2.7
Example of usage...
>>> from ctypes import *
>>> dll = CDLL('./hack.so')
>>> dll.foo.restype = py_object
>>> dll.foo('foo')
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, ...]