Subtotals for Pandas pivot table index and column

匿名 (未验证) 提交于 2019-12-03 01:38:01

问题:

I'd like to add subtotal rows for index #1 (ie. Fruits and Animal) and subtotal columns for columns (ie. 2015 and 2016).

For the subtotal columns, I could do something like this, but it seems inefficient to run this type of code for each year (2015 & 2016). Is there a better way? I don't think 'margins' will work because there are multiple subtotals.

df[('2015','2015_Total')] = df[('2015','1st')]+df[('2015','2nd')] 

For the subtotal rows (e.g., fruits total and animal total), I'm not sure where to begin.

回答1:

It is very complicated, because you need create Multiindex in columns and index.

Create subtotals is easy - use groupby with sum. Then create Multiindex and last concat new columns to original DataFrame. Last you have to sort_index (I add Total_ before value for correct sorting):

print df               2015_____     2016_______                         1st 2nd         1st 2nd Fruits Apple         10   9          11  10        Banana        20  22          21  20 Animal Lion           5   3           2   1        Tiger          2   3           5   0  df1 = df.groupby(level=0, axis=1).sum() print df1                2015_____  2016_______ Fruits Apple          19           21        Banana         42           41 Animal Lion            8            3        Tiger           5            5  print df.columns.get_level_values(0).to_series().drop_duplicates().tolist() ['2015_____', '2016_______']  #change index to multiindex new_columns = zip(df.columns.get_level_values(0).to_series().drop_duplicates().tolist(),                   "Total_" + df1.columns.str[:4]) print new_columns [('2015_____', 'Total_2015'), ('2016_______', 'Total_2016')]  df1.columns = pd.MultiIndex.from_tuples(new_columns) print df1                2015_____ 2016_______               Total_2015  Total_2016 Fruits Apple          19          21        Banana         42          41 Animal Lion            8           3        Tiger           5           5  df = pd.concat([df,df1], axis=1) 
df2 = df.groupby(level=0, sort=False).sum() print df2       2015_____     2016_______      2015_____ 2016_______              1st 2nd         1st 2nd Total_2015  Total_2016 Animal         7   6           7   1         13           8 Fruits        30  31          32  30         61          62  print df.index.levels[0][df.columns.labels[0]].to_series().drop_duplicates().tolist() ['Animal', 'Fruits']  #change index to multiindex new_idx=zip(df.index.levels[0][df.columns.labels[0]].to_series().drop_duplicates().tolist(),             "Total_" + df2.index ) print new_idx [('Animal', 'Total_Animal'), ('Fruits', 'Total_Fruits')]  df2.index = pd.MultiIndex.from_tuples(new_idx) print df2                     2015_____     2016_______      2015_____ 2016_______                           1st 2nd         1st 2nd Total_2015  Total_2016 Animal Total_Animal         7   6           7   1         13           8 Fruits Total_Fruits        30  31          32  30         61          62  df = pd.concat([df,df2]) df = df.sort_index(axis=1).sort_index() print df                     2015_____                2016_______                                          1st 2nd Total_2015         1st 2nd Total_2016 Animal Lion                 5   3          8           2   1          3        Tiger                2   3          5           5   0          5        Total_Animal         7   6         13           7   1          8 Fruits Apple               10   9         19          11  10         21        Banana              20  22         42          21  20         41        Total_Fruits        30  31         61          32  30         62 


易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!