Aggregate groups in Python Pandas and spit out percentage from a certain count

坚强是说给别人听的谎言 提交于 2021-01-27 06:31:48

问题


I am trying to figure out how to aggregate groups in Pandas data frame by creating a percentage and summation on the new columns.

For example, in the following data frame, I have columns A, B, C, and D. I would like to aggregate by groups in A, and C should be a percent of (frequency of '1' divided by frequency of non-missing value), and D should be a summation of non-missing values.

For example, for 'foo' group, the resulting data frame should be

A    B    C        D
foo       1.333    4

I am able to do some of the individual pieces here and there, but not sure how to compile in one single coherent script:

import pandas
from pandas import DataFrame
import numpy as np


df = DataFrame({'A' : ['foo', 'foo', 'foo', 'foo',
                        'bar', 'bar', 'bar', 'bar'],
                 'B' : ['one', 'one', 'two', 'three',
                        'two', 'two', 'one', 'three'],
                 'C' : [1, np.NaN, 1, 2, np.NaN, 1, 1, 2], 'D' : [2, '', 1, 1, '', 2, 2, 1]})

print df

#df.C.fillna(999, inplace=True)
df.D.replace('', np.NaN, inplace=True)

print df

grouped = df.groupby('A')

#print grouped.last()
#print grouped.sum()
#print grouped.mean()
#print grouped.count()

grouped_aggre = grouped.aggregate(np.sum)

print grouped_aggre
print df.D.mean()
print df.C.mean()

print '//////////////////'
print df.C.count()
print df.C.value_counts(dropna=True)

Furthermore, how do I aggregate by A and B columns with the aforementioned C and D column summary statistics?

Original data frame:

     A      B   C   D
0  foo    one   1   2
1  foo    one NaN NaN
2  foo    two   1   1
3  foo  three   2   1
4  bar    two NaN NaN
5  bar    two   1   2
6  bar    one   1   2
7  bar  three   2   1

Expected result:

A    B    C        D
foo       1.333    4
bar       1.333    5

回答1:


You could use groupby/agg to perform the summing and counting:

result = df.groupby(['A']).agg({'C': lambda x: x.sum()/x.count(), 'D':'sum'})

import numpy as np
import pandas as pd

df = pd.DataFrame(
    {'A' : ['foo', 'foo', 'foo', 'foo',
            'bar', 'bar', 'bar', 'bar'],
     'B' : ['one', 'one', 'two', 'three',
            'two', 'two', 'one', 'three'],
     'C' : [1, np.NaN, 1, 2, np.NaN, 1, 1, 2], 
     'D' : [2, '', 1, 1, '', 2, 2, 1]})
df['D'].replace('', np.NaN, inplace=True)

result = df.groupby(['A']).agg({'C': lambda x: x.sum()/x.count(), 'D':'sum'})
print(result)

yields

            C  D
A               
bar  1.333333  5
foo  1.333333  4


来源:https://stackoverflow.com/questions/32566866/aggregate-groups-in-python-pandas-and-spit-out-percentage-from-a-certain-count

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!