According to this post, I should be able to access the names of columns in an ndarray as a.dtype.names
Howevever, if I convert a pandas DataFrame to an ndarray with df.a
Pandas dataframe also has a handy to_records
method. Demo:
X = pd.DataFrame(dict(age=[40., 50., 60.],
sys_blood_pressure=[140.,150.,160.]))
m = X.to_records(index=False)
print repr(m)
Returns:
rec.array([(40.0, 140.0), (50.0, 150.0), (60.0, 160.0)],
dtype=[('age', '
This is a "record array", which is an ndarray subclass that allows field access using attributes, e.g. m.age
in addition to m['age']
.
You can pass this to a cython function as a regular float array by constructing a view:
m_float = m.view(float).reshape(m.shape + (-1,))
print repr(m_float)
Which gives:
rec.array([[ 40., 140.],
[ 50., 150.],
[ 60., 160.]],
dtype=float64)
Note in order for this to work, the original Dataframe must have a float dtype for every column. To make sure use m = X.astype(float, copy=False).to_records(index=False)
.