I have several 2D numpy arrays (matrix) and for each one I would like to convert it to vector containing the values of the array and a vector containing each row/column index.
I don't know if it's most efficient, but numpy.meshgrid is designed for this:
x = np.array([[3, 1, 4],
[1, 5, 9],
[2, 6, 5]])
XX,YY = np.meshgrid(np.arange(x.shape[1]),np.arange(x.shape[0]))
table = np.vstack((x.ravel(),XX.ravel(),YY.ravel())).T
print(table)
This produces:
[[3 0 0]
[1 1 0]
[4 2 0]
[1 0 1]
[5 1 1]
[9 2 1]
[2 0 2]
[6 1 2]
[5 2 2]]
Then I think df = pandas.DataFrame(table)
will give you your desired data frame.
You can simply use loops.
x = np.array([[3, 1, 4],
[1, 5, 9],
[2, 6, 5]])
values = []
coordinates = []
data_frame = []
for v in xrange(len(x)):
for h in xrange(len(x[v])):
values.append(x[v][h])
coordinates.append((h, v))
data_frame.append(x[v][h], h, v)
print '%s | %s | %s' % (x[v][h], v, h)
Update November 2020 (tested on pandas v1.1.3 and numpy v1.19):
This should be a no-brainer by using np.meshgrid and .reshape(-1)
.
x = np.array([[3, 1, 4],
[1, 5, 9]])
x_coor, y_coor = np.meshgrid(range(x.shape[1]), range(x.shape[0]))
df = pd.DataFrame({"V": x.reshape(-1), "x": x_coor.reshape(-1), "y": y_coor.reshape(-1)})
For 2-dimensional cases, you don't even need a meshgrid. Just np.tile the range of the column axis and np.repeat for the row axis.
df = pd.DataFrame({
"V": x.reshape(-1),
"x": np.tile(np.arange(x.shape[1]), x.shape[0]),
"y": np.repeat(np.arange(x.shape[0]), x.shape[1])
})
The sample data is trimmed to shape=(2, 3)
to better reflect the axes location.
Result
print(df)
V x y
0 3 0 0
1 1 1 0
2 4 2 0
3 1 0 1
4 5 1 1
5 9 2 1