问题
I'm running into a challenge with using the FuzzyWuzzy library to store all my results in a data frame column (I'm guessing it might require a loop?) I've been scratching my head over this all day, now I want to see if any of you can help me with the solution! Would be super helpful!
As an example of what I'm trying to do, here's 2 data frame tables…
Master Table
+----+-----------------+
| ID | ITEM |
+----+-----------------+
| | |
| 1 | Pepperoni Pizza |
| | |
| 2 | Cheese Pizza |
| | |
| 3 | Chicken Salad |
| | |
| 4 | Plain Salad |
+----+-----------------+
Lookup Table
+--------------+---+
| LOOKUP VALUE | - |
+--------------+---+
| | |
| Cheese | - |
| | |
| Salad | - |
+--------------+---+
Essentially I'm trying to use the lookup table's values against the entire list of values in the Master table, and store the results in a third table.
Here's how I want the final output to look...
+--------------+----------------------------+-------------------+
| LOOKUP VALUE | MATCHED VALUES | MATCHED VALUE IDS |
+--------------+----------------------------+-------------------+
| | | |
| Cheese | Cheese Pizza | 2 |
| | | |
| Salad | Chicken Salad, Plain Salad | 3,4 |
+--------------+----------------------------+-------------------+
I know the very basics of Fuzzy Wuzzy, here's how I started:
from fuzzywuzzy import fuzz
from fuzzywuzzy import process
choices = ["Pepperoni Pizza","Cheese Pizza","Chicken Salad", "Plain Salad"]
process.extract("salad",choices,limit=2)
Output = [('Chicken Salad', 90), ('Plain Salad', 90)]
Great, but how do you do that in a systematic way, running all my lookup values against all the values in the master table?
Thanks a ton for reading me out!
回答1:
It's not a good idea to store lists in DataFrame, I suggest store every match as a row in DataFrame. Here is the code:
from fuzzywuzzy import fuzz
from fuzzywuzzy import process
import pandas as pd
import io
master = pd.read_csv(io.StringIO("""ID,ITEM
1,Pepperoni Pizza
2,Cheese Pizza
3,Chicken Salad
4,Plain Salad"""))
lookups = ["Cheese", "Salad"]
choices = master.set_index("ID").ITEM.to_dict()
res = [(lookup,) + item for lookup in lookups for item in process.extract(lookup, choices,limit=2)]
df = pd.DataFrame(res, columns=["lookup", "matched", "score", "id"])
df
output:
lookup matched score id
0 Cheese Cheese Pizza 90 2
1 Cheese Chicken Salad 45 3
2 Salad Chicken Salad 90 3
3 Salad Plain Salad 90 4
Basically, I create a choices
dict from master
for match and then for loop the lookups
and store the result as a list. And convert the list to DataFrame finally.
来源:https://stackoverflow.com/questions/37891131/using-fuzzywuzzy-to-create-a-column-of-matched-results-in-the-data-frame