Pandas pivot or groupby for dynamically generated columns

前端 未结 1 749
别跟我提以往
别跟我提以往 2021-01-14 05:07

I have a dataframe with sales information in a supermarket. Each row in the dataframe represents an item, with several characteristics as columns. The original DataFrame is

相关标签:
1条回答
  • 2021-01-14 05:28

    One possible way to use groupby to make lists of it that can then be turned into columns:

    In [24]: res = df.groupby(['ticket_number', 'ticket_price'])['item'].apply(list).apply(pd.Series)
    
    In [25]: res
    Out[25]:
                                     0       1     2
    ticket_number ticket_price
    001           21            tomato   candy  soup
    002           12              soup    cola   NaN
    003           56              beef  tomato  pork
    

    Then, after cleaning up this result a bit:

    In [27]: res.columns = ['item' + str(i + 1) for i in res.columns]
    
    In [29]: res.reset_index()
    Out[29]:
      ticket_number ticket_price   item1   item2 item3
    0           001           21  tomato   candy  soup
    1           002           12    soup    cola   NaN
    2           003           56    beef  tomato  pork
    

    Another possible way to create a new column which numbers the items in each group with groupby.cumcount:

    In [38]: df['item_number'] = df.groupby('ticket_number').cumcount()
    
    In [39]: df
    Out[39]:
         item ticket_number ticket_price  item_number
    0  tomato           001           21            0
    1   candy           001           21            1
    2    soup           001           21            2
    3    soup           002           12            0
    4    cola           002           12            1
    5    beef           003           56            0
    6  tomato           003           56            1
    7    pork           003           56            2
    

    And then do some reshaping:

    In [40]: df.set_index(['ticket_number', 'ticket_price', 'item_number']).unstack(-1)
    Out[40]:
                                  item
    item_number                      0       1     2
    ticket_number ticket_price
    001           21            tomato   candy  soup
    002           12              soup    cola   NaN
    003           56              beef  tomato  pork
    

    From here, with some cleaning of the columns names, you can achieve the same as above.

    The reshaping step with set_index and untack could also be done with pivot_table: df.pivot_table(columns=['item_number'], index=['ticket_number', 'ticket _price'], values='item', aggfunc='first')

    0 讨论(0)
提交回复
热议问题