问题
I have the next problem.
I have a list with string values:
a = ['word1', 'word2', 'word3', 'word4', ..., 'wordN']
And I have the dataframe with values:
+--------------+----------+-----------+
| keywords | impressions | clicks |
+--------------+----------+-----------+
| word1 | 1245523 | 12321231 |
+--------------+----------+-----------+
| word2 | 4212321 | 12312312 |
+--------------+----------+-----------+
........................................
Please advice me on how to create a specific, aggregated dataframe with column values from list and with sum of the impressions and clicks columns if the word from list is met in keyword column.
I've tried to iterate through dataframe with iterrows() method but it does not work for this situation.
回答1:
You would want to filter your df to make sure you are only using items in the list.
df = df[df['keywords'].isin(a)]
Then you would use groupby to aggregate your results
df.groupby('keywords', as_index=False).sum()
回答2:
specify the df, then subtract the columns not to sum ("keywords"
), finally loop over the list of words:
import pandas as pd
a = ['word1', 'word2']
df = pd.DataFrame([
["word1", 1245523, 12321231],
["word2", 4212321, 12312312]
],
columns=["keywords", "impressions", "clicks"]
)
col_list = list(df)
col_list.remove('keywords')
for word in a:
df[word] = df[col_list].sum(axis=1)
print(df)
Returns:
keywords impressions clicks word1 word2
0 word1 1245523 12321231 13566754 13566754
1 word2 4212321 12312312 16524633 16524633
回答3:
Found the way:
b = []
for i in a:
b.append((a, checking_data[checking_data['keywords'].str.contains(a)][['impressions', 'clicks']].sum().values[0],
checking_data[checking_data['keywords'].str.contains(a)][['impressions', 'clicks']].sum().values[1]))
groupedOne_df = pd.DataFrame.from_records(b, columns = ['keywords', 'impressions', 'clicks'])
Now you can create a pandas df from those values.
来源:https://stackoverflow.com/questions/62835143/aggregate-dataframe-base-on-list-values