问题
I have two problems: First is...
I have one dataframe with category and keywords like this:
Category Keywords
0 Fruit ['apple', 'pear', 'plum', 'grape']
1 Color ['red', 'purple', 'green']
Another dataframe like this:
Summary
0 This is a basket of red apples. They are sour.
1 We found a bushel of fruit. They are red.
2 There is a peck of pears that taste sweet.
3 We have a box of plums.
I want the end result like this:
Category Summary
0 Fruit, Color This is a basket of red apples. They are sour.
1 Color We found a bushel of fruit. They are red.
2 Fruit, Color There is a peck of green pears that taste sweet.
3 Fruit We have a box of plums.
Second is...
I should be able to check if a string contains any of the keywords, and if true then output a list of appropriate categories.
Example: sample_sentence = "This line contains a red plum?"
output:
result_list = ['color','Fruit']
EDIT: Its kind of similar but not same.Use this for reference: How do I assign categories in a dataframe if they contain any element from another dataframe?
EDIT2:
I also have another version of first dataframe like this:
Category Filters
0 Fruit apple, pear, plum, grape
1 Color red, purple, green
回答1:
You can use list comprehension to achieve this:
Dataframe set-up:
df1 = pd.DataFrame({'Category': {0: 'Fruit', 1: 'Color'},
'Keywords': {0: 'apple,pear,plum,grape', 1: 'red,purple,green'}})
df2 = pd.DataFrame({'Summary': {0: 'This is a basket of red apples. They are sour.',
1: 'We found a bushel of fruit. They are red.',
2: 'There is a peck of pears that taste sweet.',
3: 'We have a box of plums.'}})
df1['Keywords'] = df1['Keywords'].str.split(',')
Code:
df2['Category'] = (df2['Summary'].str.split(' ').apply(
lambda x: list(set([str(a) for y in
x for a,b in
zip(df1['Category'], df1['Keywords']) for c in
b if str(c) in #Or you can use: "if str(c) == str(y)" or "if str(c).lower() == str(y).lower()"
str(y)]))).str.join(', '))
df2
Output:
Out[1]:
Summary Category
0 This is a basket of red apples. They are sour. Fruit, Color
1 We found a bushel of fruit. They are red. Color
2 There is a peck of pears that taste sweet. Fruit
3 We have a box of plums. Fruit
a
, b
and x
iterate through rows
(vertically). c
and y
iterate through lists within rows (horizontally). In order to start iterating through lists horizontally, you first need to iterate through rows vertically. That is why we have all of these variables (see image). You can use zip
to simultaneously iterate through two or more columns of the first dataframe.
来源:https://stackoverflow.com/questions/65710889/use-keywords-from-dataframe-to-detect-if-any-present-in-another-dataframe-or-str