Pandas pivot or groupby for dynamically generated columns

前端未结

关注

 1  750

I have a dataframe with sales information in a supermarket. Each row in the dataframe represents an item, with several characteristics as columns. The original DataFrame is

相关标签:

1条回答

花落未央

2021-01-14 05:28

One possible way to use groupby to make lists of it that can then be turned into columns:

In [24]: res = df.groupby(['ticket_number', 'ticket_price'])['item'].apply(list).apply(pd.Series)

In [25]: res
Out[25]:
                                 0       1     2
ticket_number ticket_price
001           21            tomato   candy  soup
002           12              soup    cola   NaN
003           56              beef  tomato  pork

Then, after cleaning up this result a bit:

In [27]: res.columns = ['item' + str(i + 1) for i in res.columns]

In [29]: res.reset_index()
Out[29]:
  ticket_number ticket_price   item1   item2 item3
0           001           21  tomato   candy  soup
1           002           12    soup    cola   NaN
2           003           56    beef  tomato  pork

Another possible way to create a new column which numbers the items in each group with groupby.cumcount:

In [38]: df['item_number'] = df.groupby('ticket_number').cumcount()

In [39]: df
Out[39]:
     item ticket_number ticket_price  item_number
0  tomato           001           21            0
1   candy           001           21            1
2    soup           001           21            2
3    soup           002           12            0
4    cola           002           12            1
5    beef           003           56            0
6  tomato           003           56            1
7    pork           003           56            2

And then do some reshaping:

In [40]: df.set_index(['ticket_number', 'ticket_price', 'item_number']).unstack(-1)
Out[40]:
                              item
item_number                      0       1     2
ticket_number ticket_price
001           21            tomato   candy  soup
002           12              soup    cola   NaN
003           56              beef  tomato  pork

From here, with some cleaning of the columns names, you can achieve the same as above.

The reshaping step with set_index and untack could also be done with pivot_table: df.pivot_table(columns=['item_number'], index=['ticket_number', 'ticket _price'], values='item', aggfunc='first')

0 讨论(0)