Python pandas pivot from long to wide

后端 未结 1 1336
醉梦人生
醉梦人生 2021-01-20 10:51

My data is currently in a long format. Below is a sample:

     Stock         Date      Time     Price     Year
       AAA   2001-01-05  15:20:09     2.380            


        
相关标签:
1条回答
  • 2021-01-20 11:46

    I think you can use pivot_table, but need some aggfunc. I choose first, because there is problem use default np.mean with datetime.

    Better explanation with sample is here and in docs.

    Solution1:

    df['idx'] = (df.groupby(['Stock', 'Year']).cumcount() + 1).astype(str)
    
    df1 = (df.pivot_table(index=['Stock', 'Year'], 
                          columns=['idx'], 
                          values=['Date', 'Time', 'Price'], 
                          aggfunc='first'))
    df1.columns = [''.join(col) for col in df1.columns]
    df1 = df1.reset_index()
    print (df1)
      Stock  Year       Date1       Date2     Time1     Time2 Price1 Price2
    0   AAA  2001  2001-01-05        None  15:20:09      None   2.38   None
    1   AAA  2002  2002-02-23  2002-02-27  10:13:24  17:17:55   2.44   2.46
    2   BBB  2006  2006-05-13  2006-10-04  16:03:49  10:33:10   2.78    2.8
    

    Then you can convert to float price columns and to_datetime date columns:

    cols = df1.columns[df1.columns.str.contains('Price')]
    df1[cols] = df1[cols].astype(float)
    
    cols = df1.columns[df1.columns.str.contains('Date')]
    df1[cols] = df1[cols].apply(pd.to_datetime)
    
    
    print (df1)
      Stock  Year      Date1      Date2     Time1     Time2  Price1  Price2
    0   AAA  2001 2001-01-05        NaT  15:20:09      None    2.38     NaN
    1   AAA  2002 2002-02-23 2002-02-27  10:13:24  17:17:55    2.44    2.46
    2   BBB  2006 2006-05-13 2006-10-04  16:03:49  10:33:10    2.78    2.80
    
    print (df1.dtypes)
    Stock             object
    Year               int64
    Date1     datetime64[ns]
    Date2     datetime64[ns]
    Time1             object
    Time2             object
    Price1           float64
    Price2           float64
    

    Solution2:

    df['idx'] = df.groupby(['Stock', 'Year']).cumcount() + 1
    
    df['date_idx'] = 'date_' + df.idx.astype(str)
    df['time_idx'] = 'time_' + df.idx.astype(str)
    df['price_idx'] = 'price_' + df.idx.astype(str)
    
    date = df.pivot_table(index=['Stock', 'Year'], columns='date_idx', values='Date', aggfunc='first')
    time = df.pivot_table(index=['Stock', 'Year'], columns='time_idx', values='Time', aggfunc='first')
    price = df.pivot_table(index=['Stock', 'Year'], columns='price_idx', values='Price', aggfunc='first')
    
    reshape = pd.concat([date, time, price], axis=1).reset_index()
    print (reshape)
      Stock  Year      date_1      date_2    time_1    time_2  price_1  price_2
    0   AAA  2001  2001-01-05        None  15:20:09      None     2.38      NaN
    1   AAA  2002  2002-02-23  2002-02-27  10:13:24  17:17:55     2.44     2.46
    2   BBB  2006  2006-05-13  2006-10-04  16:03:49  10:33:10     2.78     2.80
    
    0 讨论(0)
提交回复
热议问题