I am interested in knowing how to convert a pandas dataframe into a NumPy array.
dataframe:
import numpy as np
import pandas as pd
index = [1, 2, 3,
You can use the to_records
method, but have to play around a bit with the dtypes if they are not what you want from the get go. In my case, having copied your DF from a string, the index type is string (represented by an object
dtype in pandas):
In [102]: df
Out[102]:
label A B C
ID
1 NaN 0.2 NaN
2 NaN NaN 0.5
3 NaN 0.2 0.5
4 0.1 0.2 NaN
5 0.1 0.2 0.5
6 0.1 NaN 0.5
7 0.1 NaN NaN
In [103]: df.index.dtype
Out[103]: dtype('object')
In [104]: df.to_records()
Out[104]:
rec.array([(1, nan, 0.2, nan), (2, nan, nan, 0.5), (3, nan, 0.2, 0.5),
(4, 0.1, 0.2, nan), (5, 0.1, 0.2, 0.5), (6, 0.1, nan, 0.5),
(7, 0.1, nan, nan)],
dtype=[('index', '|O8'), ('A', '
Converting the recarray dtype does not work for me, but one can do this in Pandas already:
In [109]: df.index = df.index.astype('i8')
In [111]: df.to_records().view([('ID', '
Note that Pandas does not set the name of the index properly (to ID
) in the exported record array (a bug?), so we profit from the type conversion to also correct for that.
At the moment Pandas has only 8-byte integers, i8
, and floats, f8
(see this issue).