问题
I have a dataframe in pandas
where I am using fuzzywuzzy
package in python to match first column in the dataframe with second column.
I have defined a function to create an output with first column, second column and partial ratio score. But it is not working.
Could you please help
import csv
import sys
import os
import numpy as np
import pandas as pd
from fuzzywuzzy import fuzz
from fuzzywuzzy import process
def match(driver):
driver["score"]=driver.apply(lambda row: fuzz.partial_ratio(row driver[driver.columns[0]], driver[driver.columns[1]]), axis=1)
print(driver)
return(driver)
Regards
-Abacus
回答1:
You're passed a Series to work with inside the apply
function, representing the current row here. In your code, you're effectively ignoring this Series and trying to call partial_ratio
with the two whole columns of the DataFrame each time (driver[col]
).
A minor change to your code should hopefully give you what you want.
d = DataFrame({'one': ['fuzz', 'wuzz'], 'two': ['fizz', 'woo']})
d.apply(lambda s: fuzz.partial_ratio(s['one'], s['two']), axis=1)
0 75
1 33
dtype: int64
(Interestingly, the partial_ratio
function will accept a Series as input, but only because it converts it internally into a string. :)
来源:https://stackoverflow.com/questions/36138886/create-new-column-in-dataframe-using-fuzzywuzzy