Aggregate DataFrame base on list values

问题

I have the next problem.

I have a list with string values:

a = ['word1', 'word2', 'word3', 'word4', ..., 'wordN']

And I have the dataframe with values:

+--------------+----------+-----------+
| keywords | impressions  | clicks     | 
+--------------+----------+-----------+
| word1    | 1245523      |   12321231 |              
+--------------+----------+-----------+
| word2    | 4212321      |  12312312  |      
+--------------+----------+-----------+
........................................

Please advice me on how to create a specific, aggregated dataframe with column values from list and with sum of the impressions and clicks columns if the word from list is met in keyword column.

I've tried to iterate through dataframe with iterrows() method but it does not work for this situation.

回答1:

You would want to filter your df to make sure you are only using items in the list.

df = df[df['keywords'].isin(a)]

Then you would use groupby to aggregate your results

df.groupby('keywords', as_index=False).sum()

回答2:

specify the df, then subtract the columns not to sum ("keywords"), finally loop over the list of words:

import pandas as pd

a = ['word1', 'word2']

df = pd.DataFrame([
    ["word1", 1245523, 12321231],
    ["word2", 4212321, 12312312]
],
columns=["keywords", "impressions", "clicks"]
)

col_list = list(df)
col_list.remove('keywords')

for word in a:
    df[word] = df[col_list].sum(axis=1)

print(df)

Returns:

  keywords  impressions    clicks     word1     word2
0    word1      1245523  12321231  13566754  13566754
1    word2      4212321  12312312  16524633  16524633

回答3:

Found the way:

b = []
for i in a:
  b.append((a, checking_data[checking_data['keywords'].str.contains(a)][['impressions', 'clicks']].sum().values[0], 
               checking_data[checking_data['keywords'].str.contains(a)][['impressions', 'clicks']].sum().values[1]))


groupedOne_df = pd.DataFrame.from_records(b, columns = ['keywords', 'impressions', 'clicks'])

Now you can create a pandas df from those values.

来源：https://stackoverflow.com/questions/62835143/aggregate-dataframe-base-on-list-values

标签

python

pandas

dataframe

pandas-groupby