问题
I am trying to group by a dataframe on one column, keeping several columns from one row in each group and concatenating strings from the other rows into multiple columns based on the value of one column. Here is an example...
df = pd.DataFrame({'test' : ['a','a','a','a','a','a','b','b','b','b'],
'name' : ['aa','ab','ac','ad','ae','ba','bb','bc','bd','be'],
'amount' : [1, 2, 3, 4, 5, 6, 7, 8, 9, 9.5],
'role' : ['x','y','y','x','x','z','y','y','z','y']})
df
amount name role test
0 1.0 aa x a
1 2.0 ab y a
2 3.0 ac y a
3 4.0 ad x a
4 5.0 ae x a
5 6.0 ba z a
6 7.0 bb y b
7 8.0 bc y b
8 9.0 bd z b
9 9.5 be y b
I would like to groupby on test, retain name and amount when role = 'z', create a column (let's call it X) that concatenates the values of name when role = 'x' and another column (let's call it Y) that concatenates the values of name when role = 'y'. [Concatenated values separated by '; '] There could be zero to many rows with role = 'x', zero to many rows with role = 'y' and one row with role = 'z' per value of test. For X and Y, these can be null if there are no rows for that role for that test. The amount value is dropped for all rows with role = 'x' or 'y'. The desired output would be something like:
test name amount X Y
0 a ba 6.0 aa; ad; ae ab; ac
1 b bd 9.0 None bb; bc; be
For the concatenating part, I found x.ix[x.role == 'x', X] = "{%s}" % '; '.join(x['name'])
, which I might be able to repeat for y. I tried a few things along the lines of name = x[x.role == 'z'].name.first()
for name and amount. I also tried going down both paths of a defined function and a lambda function without success. Appreciate any thoughts.
回答1:
You can create customized columns in the apply
function after groupby
as follows where g
can be considered a sub data frame with a single value in the test column, and since you want multiple columns returned, you need to create a Series
object for each group where the indices are the corresponding headers in the result:
df.groupby('test').apply(lambda g: pd.Series({'name': g['name'][g.role == 'z'].iloc[0],
'amount': g['amount'][g.role == 'z'].iloc[0],
'X': '; '.join(g['name'][g.role == 'x']),
'Y': '; '.join(g['name'][g.role == 'y'])
})).reset_index()
回答2:
# set index and get crossection where test is 'z'
z = df.set_index(['test', 'role']).xs('z', level='role')
# get rid of 'z' rows and group by 'test' and 'role' to join names
xy = df.query('role != "z"').groupby(['test', 'role'])['name'].apply(';'.join).unstack()
# make columns of xy upper case
xy.columns = xy.columns.str.upper()
pd.concat([z, xy], axis=1).reset_index()
来源:https://stackoverflow.com/questions/40519697/python-pandas-groupby-conditional-concatenate-strings-into-multiple-columns