Pandas: Mean of columns with the same names

前端 未结 1 699
孤独总比滥情好
孤独总比滥情好 2021-02-05 20:01

I have a dataframe with columns like:

[\'id\',\'name\',\'foo1\', \'foo1\', \'foo1\', \'foo2\',\'foo2\', \'foo3\']

I would like to get a new dat

1条回答
  •  再見小時候
    2021-02-05 20:30

    The basic idea is that you can group by your columns names and do mean operations for each group.

    I saw some comments for your question and tried to give you different ways to achieve the goal. (Solution (3) is the best I found!)

    (1) Quick solution. If you have very limited columns that are non-numeric, and own unique names, e.g., columns id and name. What you can do is:

    First set index ['id', 'name'] to preserve them,

    df = df.set_index(['id', 'name']) 
    

    then use DataFrame.groupby function on columns, set axis=1 (iterate over each column), apply mean function for each group.

    df.groupby(by=df.columns, axis=1).mean()
    

    And finally, reset index to recover ['id', 'name'] columns

    df = df.reset_index()
    

    Here is a sample code:

    In [35]: df = pd.DataFrame([['001', 'a', 1, 10, 100, 1000], ['002', 'b', 2, 20, 200, 2000]], columns=['id', 'name', 'c1', 'c2', 'c2', 'c3'], index=list('AB'))
    
    In [36]: df = df.set_index(['id', 'name'])
    
    In [37]: df = df.groupby(by=df.columns, axis=1).mean()
    
    In [38]: df = df.reset_index()
    
    In [39]: df
    Out[39]: 
        id name  c1   c2    c3
    0  001    a   1   55  1000
    1  002    b   2  110  2000
    

    (2) Complete solution. If you have lots of columns that are non-numeric and unique named, what you can do is:

    First transpose you dataframe,

    df2 = df.transpose()
    

    Then you do group by operations (on its index and axis=0), but carefully handle each groups: for these numeric groups, return their mean value; and for these non-numeric groups, return their first row:

    df2 = df2.groupby(by=df2.index, axis=0).apply(lambda g: g.mean() if isinstance(g.iloc[0,0], numbers.Number) else g.iloc[0])
    

    And finally, transpose back:

    df = df2.transpose()
    

    Here is sample of code:

    In [98]: df = pd.DataFrame([['001', 'a', 1, 10, 100, 1000], ['002', 'b', 2, 20, 200, 2000]], columns=['id', 'name', 'c1', 'c2', 'c2', 'c3'], index=list('AB'))
    
    In [99]: df2 = df.transpose()
    
    In [100]: df2 = df2.groupby(by=df2.index, axis=0).apply(lambda g: g.mean() if isinstance(g.iloc[0,0], numbers.Number) else g.iloc[0])
    
    In [101]: df3 = df2.transpose()
    
    In [102]: df3
    Out[102]: 
      c1   c2    c3   id name
    A  1   55  1000  001    a
    B  2  110  2000  002    b
    
    In [103]: df
    Out[103]: 
        id name  c1  c2   c2    c3
    A  001    a   1  10  100  1000
    B  002    b   2  20  200  2000
    

    You need to import numbers

    More notes:

    (3) All in one! This solution is the best I found:

    df.groupby(by=df.columns, axis=1).apply(lambda g: g.mean(axis=1) if isinstance(g.iloc[0,0], numbers.Number) else g.iloc[:,0])
    

    I tried to handle each group for the un-transposed groups, that is,

    df.groupby(by=df.columns, axis=1).apply(gf)
    

    And

    gf = lambda g: g.mean(axis=1) if isinstance(g.iloc[0,0], numbers.Number) else g.iloc[:,0]
    

    I failed before, because I do not carefully hand the axis. You must set axis=1 for mean function, and return columns for non-numeric groups.

    Thanks!

    0 讨论(0)
提交回复
热议问题