Compare each row with all rows in data frame and save results in list for each row

前端 未结 1 1962
无人及你
无人及你 2020-12-29 14:15

I try to compare each row with all rows in a pandas dataframe with fuzzywuzzy.fuzzy.partial_ratio() >= 85 and write the results in a list for each row.

相关标签:
1条回答
  • 2020-12-29 14:41

    The first step would be to find the indices that match the condition for a given name. Since partial_ratio only takes strings, we apply it to the dataframe:

    name = 'dog'
    df.apply(lambda row: (partial_ratio(row['name'], name) >= 85), axis=1)
    

    We can then use enumerate and list comprehension to generate the list of true indices in the boolean array:

    matches = df.apply(lambda row: (partial_ratio(row['name'], name) >= 85), axis=1)
    [i for i, x in enumerate(matches) if x]
    

    Let's put all this inside a function:

    def func(name):
        matches = df.apply(lambda row: (partial_ratio(row['name'], name) >= 85), axis=1)
        return [i for i, x in enumerate(matches) if x]
    

    We can now apply the function to the entire dataframe:

    df.apply(lambda row: func(row['name']), axis=1)
    
    0 讨论(0)
提交回复
热议问题