Aggregate DataFrame base on list values

最后都变了- 提交于 2021-02-11 13:55:08

问题


I have the next problem.

I have a list with string values:

a = ['word1', 'word2', 'word3', 'word4', ..., 'wordN']

And I have the dataframe with values:

+--------------+----------+-----------+
| keywords | impressions  | clicks     | 
+--------------+----------+-----------+
| word1    | 1245523      |   12321231 |              
+--------------+----------+-----------+
| word2    | 4212321      |  12312312  |      
+--------------+----------+-----------+
........................................

Please advice me on how to create a specific, aggregated dataframe with column values from list and with sum of the impressions and clicks columns if the word from list is met in keyword column.

I've tried to iterate through dataframe with iterrows() method but it does not work for this situation.


回答1:


You would want to filter your df to make sure you are only using items in the list.

df = df[df['keywords'].isin(a)]

Then you would use groupby to aggregate your results

df.groupby('keywords', as_index=False).sum()



回答2:


specify the df, then subtract the columns not to sum ("keywords"), finally loop over the list of words:

import pandas as pd

a = ['word1', 'word2']

df = pd.DataFrame([
    ["word1", 1245523, 12321231],
    ["word2", 4212321, 12312312]
],
columns=["keywords", "impressions", "clicks"]
)

col_list = list(df)
col_list.remove('keywords')

for word in a:
    df[word] = df[col_list].sum(axis=1)

print(df)

Returns:

  keywords  impressions    clicks     word1     word2
0    word1      1245523  12321231  13566754  13566754
1    word2      4212321  12312312  16524633  16524633



回答3:


Found the way:

b = []
for i in a:
  b.append((a, checking_data[checking_data['keywords'].str.contains(a)][['impressions', 'clicks']].sum().values[0], 
               checking_data[checking_data['keywords'].str.contains(a)][['impressions', 'clicks']].sum().values[1]))


groupedOne_df = pd.DataFrame.from_records(b, columns = ['keywords', 'impressions', 'clicks'])

Now you can create a pandas df from those values.



来源:https://stackoverflow.com/questions/62835143/aggregate-dataframe-base-on-list-values

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!