Convert pandas column of numpy arrays to numpy array of higher dimension

前端 未结 2 1756
独厮守ぢ
独厮守ぢ 2021-01-29 04:52

I have a pandas dataframe of shape (75,9).

Only one of those columns is of numpy arrays, each of which is of shape (100, 4, 3)

I have a

相关标签:
2条回答
  • 2021-01-29 05:18
    In [42]: some_df = pd.DataFrame(columns=['A']) 
        ...: for i in range(4): 
        ...:         some_df.loc[i] = [np.random.randint(0,10,(1,3))] 
        ...:                                                                                  
    In [43]: some_df                                                                          
    Out[43]: 
                 A
    0  [[7, 0, 9]]
    1  [[3, 6, 8]]
    2  [[9, 7, 6]]
    3  [[1, 6, 3]]
    

    The numpy values of the column are an object dtype array, containing arrays:

    In [44]: some_df['A'].to_numpy()                                                          
    Out[44]: 
    array([array([[7, 0, 9]]), array([[3, 6, 8]]), array([[9, 7, 6]]),
           array([[1, 6, 3]])], dtype=object)
    

    If those arrays all have the same shape, stack does a nice job of concatenating them on a new dimension:

    In [45]: np.stack(some_df['A'].to_numpy())                                                
    Out[45]: 
    array([[[7, 0, 9]],
    
           [[3, 6, 8]],
    
           [[9, 7, 6]],
    
           [[1, 6, 3]]])
    In [46]: _.shape                                                                          
    Out[46]: (4, 1, 3)
    

    This only works with one column. stack like all concatenate treats the input argument as an iterable, effectively a list of arrays.

    In [48]: some_df['A'].to_list()                                                           
    Out[48]: 
    [array([[7, 0, 9]]),
     array([[3, 6, 8]]),
     array([[9, 7, 6]]),
     array([[1, 6, 3]])]
    In [50]: np.stack(some_df['A'].to_list()).shape                                           
    Out[50]: (4, 1, 3)
    
    0 讨论(0)
  • 2021-01-29 05:39

    What you're asking for is not quite possible. Pandas DataFrames are 2D. Yes, you can store NumPy arrays as objects (references) inside DataFrame cells, but this is not really well supported, and expecting to get a shape which has one dimension from the DataFrame and two from the arrays inside is not possible at all.

    You should consider storing your data either entirely in NumPy arrays of the appropriate shape, or in a single, properly 2D DataFrame with MultiIndex. For example you can "pivot" a column of 1D arrays to become a column of scalars if you move the extra dimension to a new level of a MultIndex on the rows:

      A
    x [2, 3]
    y [5, 6]
    

    becomes:

        A
    x 0 2
      1 3
    y 0 5
      1 6
    

    or pivot to the columns:

      A
      0 1
    x 2 3
    y 5 6
    
    0 讨论(0)
提交回复
热议问题