edit-distance | 易学教程

Optimizing Levenshtein distance algorithm

阅读更多关于 Optimizing Levenshtein distance algorithm

问题 I have a stored procedure that uses Levenshtein distance to determine the result closest to what the user typed. The only thing really affecting the speed is the function that calculates the Levenshtein distance for all the records before selecting the record with the lowest distance (I've verified this by putting a 0 in place of the call to the Levenshtein function). The table has 1.5 million records, so even the slightest adjustment may shave off a few seconds. Right now the entire thing

Quickly compare a string against a Collection in Java

阅读更多关于 Quickly compare a string against a Collection in Java

I am trying to calculate edit distances of a string against a collection to find the closest match. My current problem is that the collection is very large (about 25000 items), so I had to narrow down the set to just strings of similar lengths but that still would only narrow it down to a few thousand strings and this still is very slow. Is there a datastructure that allows for a quick lookup of similar strings or is there another way I could address this problem? Sounds like a BK-tree might be what you want. Here's an article discussing them: http://blog.notdot.net/2007/4/Damn-Cool-Algorithms

How do you implement Levenshtein distance in Delphi?

阅读更多关于 How do you implement Levenshtein distance in Delphi?

I'm posting this in the spirit of answering your own questions. The question I had was: How can I implement the Levenshtein algorithm for calculating edit-distance between two strings, as described here , in Delphi? Just a note on performance: This thing is very fast. On my desktop (2.33 Ghz dual-core, 2GB ram, WinXP), I can run through an array of 100K strings in less than one second. function EditDistance(s, t: string): integer; var d : array of array of integer; i,j,cost : integer; begin { Compute the edit-distance between two strings. Algorithm and description may be found at either of

Change distance between x-axis ticks in ggplot2

阅读更多关于 Change distance between x-axis ticks in ggplot2

问题 Right now I am producing a line graph with three observations. Hence, there are three x-axis ticks. I want to manually reduce the distance between the x-axis ticks and basically force the observations to be closer to each other. In other words, I want to reduce the distance between the x-axis ticks. My data: structure(list(Period = c("January 1997 - August 2003", "September 2003 - Jun 2005", "Jul 2005 - Dec 2009", "January 1997 - August 2003", "September 2003 - Jun 2005", "Jul 2005 - Dec 2009

Shortest path to transform one word into another

阅读更多关于 Shortest path to transform one word into another

For a Data Structures project, I must find the shortest path between two words (like "cat" and "dog" ), changing only one letter at a time. We are given a Scrabble word list to use in finding our path. For example: cat -> bat -> bet -> bot -> bog -> dog I've solved the problem using a breadth first search, but am seeking something better (I represented the dictionary with a trie). Please give me some ideas for a more efficient method (in terms of speed and memory). Something ridiculous and/or challenging is preferred. I asked one of my friends (he's a junior) and he said that there is no

Shortest path to transform one word into another

阅读更多关于 Shortest path to transform one word into another

问题 For a Data Structures project, I must find the shortest path between two words (like \"cat\" and \"dog\" ), changing only one letter at a time. We are given a Scrabble word list to use in finding our path. For example: cat -> bat -> bet -> bot -> bog -> dog I\'ve solved the problem using a breadth first search, but am seeking something better (I represented the dictionary with a trie). Please give me some ideas for a more efficient method (in terms of speed and memory). Something ridiculous

Similarity scores based on string comparison in R (edit distance)

阅读更多关于 Similarity scores based on string comparison in R (edit distance)

问题 I am trying to assign similarity score based on comparison between 2 strings. Is there a function for the same in R. I am aware of such a function in SAS by the name of SPEDIS. Please let me know if there is such a function in R. 回答1: The function adist computes the Levenshtein edit distance between two strings. This can be transformed into a similarity metric as 1 - (Levenshtein edit distance / longer string length). The levenshteinSim function in the RecordLinkage package also does this

Levenshtein distance in T-SQL

阅读更多关于 Levenshtein distance in T-SQL

问题 I am interested in algorithm in T-SQL calculating Levenshtein distance. 回答1: I implemented the standard Levenshtein edit distance function in TSQL with several optimizations that improves the speed over the other versions I'm aware of. In cases where the two strings have characters in common at their start (shared prefix), characters in common at their end (shared suffix), and when the strings are large and a max edit distance is provided, the improvement in speed is significant. For example,