I have a pandas dataframe of shape (75,9)
.
Only one of those columns is of numpy arrays, each of which is of shape (100, 4, 3)
I have a
In [42]: some_df = pd.DataFrame(columns=['A'])
...: for i in range(4):
...: some_df.loc[i] = [np.random.randint(0,10,(1,3))]
...:
In [43]: some_df
Out[43]:
A
0 [[7, 0, 9]]
1 [[3, 6, 8]]
2 [[9, 7, 6]]
3 [[1, 6, 3]]
The numpy values of the column are an object dtype array, containing arrays:
In [44]: some_df['A'].to_numpy()
Out[44]:
array([array([[7, 0, 9]]), array([[3, 6, 8]]), array([[9, 7, 6]]),
array([[1, 6, 3]])], dtype=object)
If those arrays all have the same shape, stack
does a nice job of concatenating them on a new dimension:
In [45]: np.stack(some_df['A'].to_numpy())
Out[45]:
array([[[7, 0, 9]],
[[3, 6, 8]],
[[9, 7, 6]],
[[1, 6, 3]]])
In [46]: _.shape
Out[46]: (4, 1, 3)
This only works with one column. stack
like all concatenate
treats the input argument as an iterable, effectively a list of arrays.
In [48]: some_df['A'].to_list()
Out[48]:
[array([[7, 0, 9]]),
array([[3, 6, 8]]),
array([[9, 7, 6]]),
array([[1, 6, 3]])]
In [50]: np.stack(some_df['A'].to_list()).shape
Out[50]: (4, 1, 3)