mySQL: Using Levenshtein distance to find duplicates in 20,000 rows
问题 I basically have a two column table containing a primary key and names of companies with about 20,000 rows. My task is to find all duplicate entries. I originally tried using soundex, but it would match companies that were completely different, just because they had similar first words. So this led me on to the levenshtein distance algorithm. The problem is, the query takes an indefinite amount of time. I've left it for about 10 hours now, it still hasn't responded. Here is the query: SELECT