I have a pandas dataframe of shape (75,9)
.
Only one of those columns is of numpy arrays, each of which is of shape (100, 4, 3)
I have a
In [42]: some_df = pd.DataFrame(columns=['A'])
...: for i in range(4):
...: some_df.loc[i] = [np.random.randint(0,10,(1,3))]
...:
In [43]: some_df
Out[43]:
A
0 [[7, 0, 9]]
1 [[3, 6, 8]]
2 [[9, 7, 6]]
3 [[1, 6, 3]]
The numpy values of the column are an object dtype array, containing arrays:
In [44]: some_df['A'].to_numpy()
Out[44]:
array([array([[7, 0, 9]]), array([[3, 6, 8]]), array([[9, 7, 6]]),
array([[1, 6, 3]])], dtype=object)
If those arrays all have the same shape, stack
does a nice job of concatenating them on a new dimension:
In [45]: np.stack(some_df['A'].to_numpy())
Out[45]:
array([[[7, 0, 9]],
[[3, 6, 8]],
[[9, 7, 6]],
[[1, 6, 3]]])
In [46]: _.shape
Out[46]: (4, 1, 3)
This only works with one column. stack
like all concatenate
treats the input argument as an iterable, effectively a list of arrays.
In [48]: some_df['A'].to_list()
Out[48]:
[array([[7, 0, 9]]),
array([[3, 6, 8]]),
array([[9, 7, 6]]),
array([[1, 6, 3]])]
In [50]: np.stack(some_df['A'].to_list()).shape
Out[50]: (4, 1, 3)
What you're asking for is not quite possible. Pandas DataFrames are 2D. Yes, you can store NumPy arrays as object
s (references) inside DataFrame cells, but this is not really well supported, and expecting to get a shape
which has one dimension from the DataFrame and two from the arrays inside is not possible at all.
You should consider storing your data either entirely in NumPy arrays of the appropriate shape, or in a single, properly 2D DataFrame with MultiIndex. For example you can "pivot" a column of 1D arrays to become a column of scalars if you move the extra dimension to a new level of a MultIndex on the rows:
A
x [2, 3]
y [5, 6]
becomes:
A
x 0 2
1 3
y 0 5
1 6
or pivot to the columns:
A
0 1
x 2 3
y 5 6