Summing columns in Dataframe that have matching column headers

試著忘記壹切 提交于 2021-02-08 06:22:08

问题


I have a dataframe that currently looks somewhat like this.

import pandas as pd
In [161]: pd.DataFrame(np.c_[s,t],columns = ["M1","M2","M1","M2"])
Out[161]: 
            M1    M2    M1    M2
      6/7    1     2     3     5
      6/8    2     4     7     8
      6/9    3     6     9     9
      6/10   4     8     8    10
      6/11   5    10    20    40

Except, instead of just four columns, there are approximately 1000 columns, from M1 till ~M340 (there are multiple columns with the same headers). I wanted to sum the values associated with matching columns based on their index. Ideally, the result dataframe would look like:

            M1_sum   M2_sum    
      6/7     4        7   
      6/8     9        12  
      6/9    12        15   
      6/10   12        18        
      6/11   25        50      

I wanted to somehow apply the "groupby" and "sum" function, but was unsure how to do that when dealing with a dataframe that has multiple columns and has some columns with 3 other columns matching whereas another may only have one other column matching (or even 0 other columns matching).


回答1:


You probably want to groupby the first level, and over the second axis, and then perform a .sum(), like:

>>> df.groupby(level=0,axis=1).sum().add_suffix('_sum')
   M1_sum  M2_sum
0       4       7
1       9      12
2      12      15
3      12      18
4      25      50

If we rename the last column to M1 instead, it will again group this correctly:

>>> df
   M1  M2  M1  M1
0   1   2   3   5
1   2   4   7   8
2   3   6   9   9
3   4   8   8  10
4   5  10  20  40
>>> df.groupby(level=0,axis=1).sum().add_suffix('_sum')
   M1_sum  M2_sum
0       9       2
1      17       4
2      21       6
3      22       8
4      65      10


来源:https://stackoverflow.com/questions/56813459/summing-columns-in-dataframe-that-have-matching-column-headers

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!