I have a C++ function computing a large tensor which I would like to return to Python as a NumPy array via pybind11.
From the documentation of pybind11, it seems li
A few comments (then a working implementation).
pybind11::object
, pybind11::list
, and, in this case, pybind11::array_t<T>
) are really just wrappers around an underlying Python object pointer. In this respect there are already taking on the role of a shared pointer wrapper, and so there's no point in wrapping that in a unique_ptr
: returning the py::array_t<T>
object directly is already essentially just returning a glorified pointer.pybind11::array_t
can be constructed directly from a data pointer, so you can skip the py::buffer_info
intermediate step and just give the shape and strides directly to the pybind11::array_t
constructor. A numpy array constructed this way won't own its own data, it'll just reference it (that is, the numpy owndata
flag will be set to false).py::capsule
class to help you do exactly this. What you want to do is make the numpy array depend on this capsule as its parent class by specifying it as the base
argument to array_t
. That will make the numpy array reference it, keeping it alive as long as the array itself is alive, and invoke the cleanup function when it is no longer referenced.c_style
flag in the older (pre-2.2) releases only had an effect on new arrays, i.e. when not passing a value pointer. That was fixed in the 2.2 release to also affect the automatic strides if you specify only shapes but not strides. It has no effect at all if you specify the strides directly yourself (as I do in the example below).So, putting the pieces together, this code is a complete pybind11 module that demonstrates how you can accomplish what you're looking for (and includes some C++ output to demonstrate that is indeed working correctly):
#include <iostream>
#include <pybind11/pybind11.h>
#include <pybind11/numpy.h>
namespace py = pybind11;
PYBIND11_PLUGIN(numpywrap) {
py::module m("numpywrap");
m.def("f", []() {
// Allocate and initialize some data; make this big so
// we can see the impact on the process memory use:
constexpr size_t size = 100*1000*1000;
double *foo = new double[size];
for (size_t i = 0; i < size; i++) {
foo[i] = (double) i;
}
// Create a Python object that will free the allocated
// memory when destroyed:
py::capsule free_when_done(foo, [](void *f) {
double *foo = reinterpret_cast<double *>(f);
std::cerr << "Element [0] = " << foo[0] << "\n";
std::cerr << "freeing memory @ " << f << "\n";
delete[] foo;
});
return py::array_t<double>(
{100, 1000, 1000}, // shape
{1000*1000*8, 1000*8, 8}, // C-style contiguous strides for double
foo, // the data pointer
free_when_done); // numpy array references this parent
});
return m.ptr();
}
Compiling that and invoking it from Python shows it working:
>>> import numpywrap
>>> z = numpywrap.f()
>>> # the python process is now taking up a bit more than 800MB memory
>>> z[1,1,1]
1001001.0
>>> z[0,0,100]
100.0
>>> z[99,999,999]
99999999.0
>>> z[0,0,0] = 3.141592
>>> del z
Element [0] = 3.14159
freeing memory @ 0x7fd769f12010
>>> # python process memory size has dropped back down