问题
I am trying to sum multiple rows together based on a keyword that is part of the index - but it is not the entire index. For example, the index could look like
Count
1234_Banana_Green 43
4321_Banana_Yellow 34
2244_Banana_Brown 23
12345_Apple_Red 45
I would like to sum all of the rows that have the same "keyword" within them and create a total "banana" row. Is there a way to do this without searching for the keyword "banana"? For my purposes, this keyword changes every time and I would like to be able to automate this summing process. Any help is very much appreciated.
回答1:
May be this:
df.groupby(df.index.to_series()
.str.split('_', expand=True)[1]
)['Count'].sum()
Output:
1
Apple 45
Banana 100
Name: Count, dtype: int64
回答2:
Given the following dataframe:
raw_data = {'id': ['1234_Banana_Green', '4321_Banana_Yellow',
'2244_Banana_Brown', '12345_Apple_Red',
'1267_Apple_Blue']}
df = pd.DataFrame(raw_data).set_index(['id'])
Try this code:
df = df.reset_index()
df['extracted_keyword'] = df['id'].apply(lambda x: x.split('_')[1])
df.groupby(["extracted_keyword"]).count()
And gives:
id
extracted_keyword
Apple 2
Banana 3
if you want restore the index, add in the end:
df = df.set_index(['id'])
来源:https://stackoverflow.com/questions/58082549/summing-rows-based-on-keyword-within-index