Use keywords from dataframe to detect if any present in another dataframe or string

佐手、 提交于 2021-02-10 18:22:46

问题


I have two problems: First is...

I have one dataframe with category and keywords like this:

  Category                   Keywords
0    Fruit            ['apple', 'pear', 'plum', 'grape']
1    Color            ['red', 'purple', 'green']

Another dataframe like this:

              Summary
0        This is a basket of red apples. They are sour.
1        We found a bushel of fruit. They are red.
2        There is a peck of pears that taste sweet.
3        We have a box of plums.

I want the end result like this:

      Category                                            Summary
0    Fruit, Color     This is a basket of red apples. They are sour.
1           Color     We found a bushel of fruit. They are red.
2    Fruit, Color     There is a peck of green pears that taste sweet.
3           Fruit     We have a box of plums.

Second is...

I should be able to check if a string contains any of the keywords, and if true then output a list of appropriate categories.

Example: sample_sentence = "This line contains a red plum?"

output:

result_list = ['color','Fruit']

EDIT: Its kind of similar but not same.Use this for reference: How do I assign categories in a dataframe if they contain any element from another dataframe?

EDIT2:

I also have another version of first dataframe like this:

  Category                   Filters
0    Fruit  apple, pear, plum, grape
1    Color        red, purple, green

回答1:


You can use list comprehension to achieve this:

Dataframe set-up:

df1 = pd.DataFrame({'Category': {0: 'Fruit', 1: 'Color'},
 'Keywords': {0: 'apple,pear,plum,grape', 1: 'red,purple,green'}})
df2 = pd.DataFrame({'Summary': {0: 'This is a basket of red apples. They are sour.',
  1: 'We found a bushel of fruit. They are red.',
  2: 'There is a peck of pears that taste sweet.',
  3: 'We have a box of plums.'}})
df1['Keywords'] = df1['Keywords'].str.split(',')

Code:

df2['Category'] = (df2['Summary'].str.split(' ').apply(
    lambda x: list(set([str(a) for y in 
                        x for a,b in 
                        zip(df1['Category'], df1['Keywords']) for c in 
                        b if str(c) in #Or you can use: "if str(c) == str(y)" or "if str(c).lower() == str(y).lower()"
                        str(y)]))).str.join(', '))
df2

Output:

Out[1]: 
                                          Summary      Category
0  This is a basket of red apples. They are sour.  Fruit, Color
1       We found a bushel of fruit. They are red.         Color
2      There is a peck of pears that taste sweet.         Fruit
3                         We have a box of plums.         Fruit

a, b and x iterate through rows (vertically). c and y iterate through lists within rows (horizontally). In order to start iterating through lists horizontally, you first need to iterate through rows vertically. That is why we have all of these variables (see image). You can use zip to simultaneously iterate through two or more columns of the first dataframe.



来源:https://stackoverflow.com/questions/65710889/use-keywords-from-dataframe-to-detect-if-any-present-in-another-dataframe-or-str

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!