问题
I am trying to figure out how to aggregate groups in Pandas data frame by creating a percentage and summation on the new columns.
For example, in the following data frame, I have columns A, B, C, and D. I would like to aggregate by groups in A, and C should be a percent of (frequency of '1' divided by frequency of non-missing value), and D should be a summation of non-missing values.
For example, for 'foo' group, the resulting data frame should be
A B C D
foo 1.333 4
I am able to do some of the individual pieces here and there, but not sure how to compile in one single coherent script:
import pandas
from pandas import DataFrame
import numpy as np
df = DataFrame({'A' : ['foo', 'foo', 'foo', 'foo',
'bar', 'bar', 'bar', 'bar'],
'B' : ['one', 'one', 'two', 'three',
'two', 'two', 'one', 'three'],
'C' : [1, np.NaN, 1, 2, np.NaN, 1, 1, 2], 'D' : [2, '', 1, 1, '', 2, 2, 1]})
print df
#df.C.fillna(999, inplace=True)
df.D.replace('', np.NaN, inplace=True)
print df
grouped = df.groupby('A')
#print grouped.last()
#print grouped.sum()
#print grouped.mean()
#print grouped.count()
grouped_aggre = grouped.aggregate(np.sum)
print grouped_aggre
print df.D.mean()
print df.C.mean()
print '//////////////////'
print df.C.count()
print df.C.value_counts(dropna=True)
Furthermore, how do I aggregate by A and B columns with the aforementioned C and D column summary statistics?
Original data frame:
A B C D
0 foo one 1 2
1 foo one NaN NaN
2 foo two 1 1
3 foo three 2 1
4 bar two NaN NaN
5 bar two 1 2
6 bar one 1 2
7 bar three 2 1
Expected result:
A B C D
foo 1.333 4
bar 1.333 5
回答1:
You could use groupby/agg to perform the summing and counting:
result = df.groupby(['A']).agg({'C': lambda x: x.sum()/x.count(), 'D':'sum'})
import numpy as np
import pandas as pd
df = pd.DataFrame(
{'A' : ['foo', 'foo', 'foo', 'foo',
'bar', 'bar', 'bar', 'bar'],
'B' : ['one', 'one', 'two', 'three',
'two', 'two', 'one', 'three'],
'C' : [1, np.NaN, 1, 2, np.NaN, 1, 1, 2],
'D' : [2, '', 1, 1, '', 2, 2, 1]})
df['D'].replace('', np.NaN, inplace=True)
result = df.groupby(['A']).agg({'C': lambda x: x.sum()/x.count(), 'D':'sum'})
print(result)
yields
C D
A
bar 1.333333 5
foo 1.333333 4
来源:https://stackoverflow.com/questions/32566866/aggregate-groups-in-python-pandas-and-spit-out-percentage-from-a-certain-count