I have several 2D numpy arrays (matrix) and for each one I would like to convert it to vector containing the values of the array and a vector containing each row/column index.
You can try this using itertools
import itertools
import numpy as np
import pandas as pd
def convert2dataframe(array):
a, b = array.shape
x, y = zip(*list(itertools.product(range(a), range(b))))
df = pd.DataFrame(data={'V':array.ravel(), 'x':x, 'y':y})
return df
This works for arrays of any shape, not necessarily square matrices.
Like @miguel-capllonch I would suggest using np.ndindex
which allows you to create the desired output like this:
np.array([(v, *i) for (i, v) in zip(np.ndindex(x.shape), x.ravel())])
which results in an array that looks like this:
array([[ 3. 0. 0.]
[ 1. 0. 1.]
[ 4. 0. 2.]
[ 1. 1. 0.]
[ 5. 1. 1.]
[ 9. 1. 2.]
[ 2. 2. 0.]
[ 6. 2. 1.]
[ 5. 2. 2.]])
Alternatively, using only numpy commands
np.hstack((list(np.ndindex(x.shape)), x.reshape((-1, 1))))
You could also let pandas do the work for you since you'll be using it in a dataframe:
x = np.array([[3, 1, 4],
[1, 5, 9],
[2, 6, 5]])
df=pd.DataFrame(x)
#unstack the y columns so that they become an index then reset the
#index so that indexes become columns.
df=df.unstack().reset_index()
df
level_0 level_1 0
0 0 0 3
1 0 1 1
2 0 2 2
3 1 0 1
4 1 1 5
5 1 2 6
6 2 0 4
7 2 1 9
8 2 2 5
#name the columns and switch the column order
df.columns=['x','y','V']
cols = df.columns.tolist()
cols = cols[-1:] + cols[:-1]
df = df[cols]
df
V x y
0 3 0 0
1 1 0 1
2 2 0 2
3 1 1 0
4 5 1 1
5 6 1 2
6 4 2 0
7 9 2 1
8 5 2 2
The class np.ndindex
is especially meant for this, and easily does the trick. Similar efficiency to the np.mesgrid
method above, but it requires less code:
indices = np.array(list(np.ndindex(x.shape)))
For the dataframe, do:
df = pd.DataFrame({'V': x.flatten(), 'x': indices[:, 0], 'y': indices[:, 1]})
If you don't need the dataframe, just do list(np.ndindex(x.shape))
.
Note: don't get confused between x
(the array at hand), and 'x'
(the name of the second column).
I know this question was posted a very long time ago, but just in case it's useful to anyone, as I didn't see np.ndindex
being mentioned.
I am resurrecting this because I think I know a different answer that is way easier to understand. Here is how I do it:
xn = np.zeros((np.size(x), np.ndim(x)+1), dtype=np.float32)
row = 0
for ind, data in np.ndenumerate(x):
xn[row, 0] = data
xn[row, 1:] = np.asarray(ind)
row += 1
In xn
we have
[[ 3. 0. 0.]
[ 1. 0. 1.]
[ 4. 0. 2.]
[ 1. 1. 0.]
[ 5. 1. 1.]
[ 9. 1. 2.]
[ 2. 2. 0.]
[ 6. 2. 1.]
[ 5. 2. 2.]]
Another way:
arr = np.array([[3, 1, 4],
[1, 5, 9],
[2, 6, 5]])
# build out rows array
x = np.arange(arr.shape[0]).reshape(arr.shape[0],1).repeat(arr.shape[1],axis=1)
# build out columns array
y = np.arange(arr.shape[1]).reshape(1,arr.shape[0]).repeat(arr.shape[0],axis=0)
# combine into table
table = np.vstack((arr.reshape(arr.size),x.reshape(arr.size),y.reshape(arr.size))).T
print(table)