edit-distance

Optimizing Levenshtein distance algorithm

↘锁芯ラ 提交于 2019-11-29 10:28:19
问题 I have a stored procedure that uses Levenshtein distance to determine the result closest to what the user typed. The only thing really affecting the speed is the function that calculates the Levenshtein distance for all the records before selecting the record with the lowest distance (I've verified this by putting a 0 in place of the call to the Levenshtein function). The table has 1.5 million records, so even the slightest adjustment may shave off a few seconds. Right now the entire thing

Quickly compare a string against a Collection in Java

▼魔方 西西 提交于 2019-11-29 07:27:38
I am trying to calculate edit distances of a string against a collection to find the closest match. My current problem is that the collection is very large (about 25000 items), so I had to narrow down the set to just strings of similar lengths but that still would only narrow it down to a few thousand strings and this still is very slow. Is there a datastructure that allows for a quick lookup of similar strings or is there another way I could address this problem? Sounds like a BK-tree might be what you want. Here's an article discussing them: http://blog.notdot.net/2007/4/Damn-Cool-Algorithms

How do you implement Levenshtein distance in Delphi?

孤人 提交于 2019-11-28 05:02:19
I'm posting this in the spirit of answering your own questions. The question I had was: How can I implement the Levenshtein algorithm for calculating edit-distance between two strings, as described here , in Delphi? Just a note on performance: This thing is very fast. On my desktop (2.33 Ghz dual-core, 2GB ram, WinXP), I can run through an array of 100K strings in less than one second. function EditDistance(s, t: string): integer; var d : array of array of integer; i,j,cost : integer; begin { Compute the edit-distance between two strings. Algorithm and description may be found at either of

Change distance between x-axis ticks in ggplot2

柔情痞子 提交于 2019-11-28 04:10:10
问题 Right now I am producing a line graph with three observations. Hence, there are three x-axis ticks. I want to manually reduce the distance between the x-axis ticks and basically force the observations to be closer to each other. In other words, I want to reduce the distance between the x-axis ticks. My data: structure(list(Period = c("January 1997 - August 2003", "September 2003 - Jun 2005", "Jul 2005 - Dec 2009", "January 1997 - August 2003", "September 2003 - Jun 2005", "Jul 2005 - Dec 2009

Shortest path to transform one word into another

大城市里の小女人 提交于 2019-11-27 06:38:33
For a Data Structures project, I must find the shortest path between two words (like "cat" and "dog" ), changing only one letter at a time. We are given a Scrabble word list to use in finding our path. For example: cat -> bat -> bet -> bot -> bog -> dog I've solved the problem using a breadth first search, but am seeking something better (I represented the dictionary with a trie). Please give me some ideas for a more efficient method (in terms of speed and memory). Something ridiculous and/or challenging is preferred. I asked one of my friends (he's a junior) and he said that there is no

Shortest path to transform one word into another

匆匆过客 提交于 2019-11-26 12:05:56
问题 For a Data Structures project, I must find the shortest path between two words (like \"cat\" and \"dog\" ), changing only one letter at a time. We are given a Scrabble word list to use in finding our path. For example: cat -> bat -> bet -> bot -> bog -> dog I\'ve solved the problem using a breadth first search, but am seeking something better (I represented the dictionary with a trie). Please give me some ideas for a more efficient method (in terms of speed and memory). Something ridiculous

Similarity scores based on string comparison in R (edit distance)

对着背影说爱祢 提交于 2019-11-26 09:23:50
问题 I am trying to assign similarity score based on comparison between 2 strings. Is there a function for the same in R. I am aware of such a function in SAS by the name of SPEDIS. Please let me know if there is such a function in R. 回答1: The function adist computes the Levenshtein edit distance between two strings. This can be transformed into a similarity metric as 1 - (Levenshtein edit distance / longer string length). The levenshteinSim function in the RecordLinkage package also does this

Levenshtein distance in T-SQL

不打扰是莪最后的温柔 提交于 2019-11-25 23:14:49
问题 I am interested in algorithm in T-SQL calculating Levenshtein distance. 回答1: I implemented the standard Levenshtein edit distance function in TSQL with several optimizations that improves the speed over the other versions I'm aware of. In cases where the two strings have characters in common at their start (shared prefix), characters in common at their end (shared suffix), and when the strings are large and a max edit distance is provided, the improvement in speed is significant. For example,