Using cumsum in pandas on group()

后端 未结 2 1734
难免孤独
难免孤独 2020-12-28 21:33

From a Pandas newbie: I have data that looks essentially like this -

 data1=pd.DataFrame({\'Dir\':[\'E\',\'E\',\'W\',\'W\',\'E\',\'W\',\'W\',\'E\'], \'Bool\'         


        
2条回答
  •  有刺的猬
    2020-12-28 21:42

    Try this:

    data2 = data1.reset_index()
    data3 = data2.set_index(["Bool", "Dir", "index"])   # index is the new column created by reset_index
    running_sum = data3.groupby(level=[0,1,2]).sum().groupby(level=[0,1]).cumsum()
    

    The reason you cannot simply use cumsum on data3 has to do with how your data is structured. Grouping by Bool and Dir and applying an aggregation function (sum, mean, etc) would produce a DataFrame of a smaller size than you started with, as whatever function you used would aggregate values based on your group keys. However cumsum is not an aggreagation function. It wil return a DataFrame that is the same size as the one it's called with. So unless your input DataFrame is in a format where the output can be the same size after calling cumsum, it will throw an error. That's why I called sum first, which returns a DataFrame in the correct input format.

    Sorry if I haven't explained this well enough. Maybe someone else could help me out?

提交回复
热议问题