edit-distance

Is there an edit distance algorithm that takes “chunk transposition” into account?

柔情痞子 提交于 2019-12-03 09:49:59
问题 I put "chunk transposition" in quotes because I don't know whether or what the technical term should be. Just knowing if there is a technical term for the process would be very helpful. The Wikipedia article on edit distance gives some good background on the concept. By taking "chunk transposition" into account, I mean that Turing, Alan. should match Alan Turing more closely than it matches Turing Machine I.e. the distance calculation should detect when substrings of the text have simply been

Edit distance between two graphs

与世无争的帅哥 提交于 2019-12-03 07:01:35
问题 I'm just wondering if, like for strings where we have the Levenshtein distance (or edit distance) between two strings, is there something similar for graphs? I mean, a scalar measure that identifies the number of atomic operations (node and edges insertion/deletion) to transform a graph G1 to a graph G2 . 回答1: I think graph edit distance is the measure that you were looking for. Graph edit distance measures the minimum number of graph edit operations to transform one graph to another, and the

Levenshtein distance: how to better handle words swapping positions?

非 Y 不嫁゛ 提交于 2019-12-03 01:04:50
问题 I've had some success comparing strings using the PHP levenshtein function. However, for two strings which contain substrings that have swapped positions, the algorithm counts those as whole new substrings. For example: levenshtein("The quick brown fox", "brown quick The fox"); // 10 differences are treated as having less in common than: levenshtein("The quick brown fox", "The quiet swine flu"); // 9 differences I'd prefer an algorithm which saw that the first two were more similar. How could

Is there an edit distance algorithm that takes “chunk transposition” into account?

主宰稳场 提交于 2019-12-03 00:17:49
I put "chunk transposition" in quotes because I don't know whether or what the technical term should be. Just knowing if there is a technical term for the process would be very helpful. The Wikipedia article on edit distance gives some good background on the concept. By taking "chunk transposition" into account, I mean that Turing, Alan. should match Alan Turing more closely than it matches Turing Machine I.e. the distance calculation should detect when substrings of the text have simply been moved within the text. This is not the case with the common Levenshtein distance formula. The strings

Edit distance between two graphs

旧街凉风 提交于 2019-12-02 20:35:37
I'm just wondering if, like for strings where we have the Levenshtein distance (or edit distance) between two strings, is there something similar for graphs? I mean, a scalar measure that identifies the number of atomic operations (node and edges insertion/deletion) to transform a graph G1 to a graph G2 . I think graph edit distance is the measure that you were looking for. Graph edit distance measures the minimum number of graph edit operations to transform one graph to another, and the allowed graph edit operations includes: Insert/delete an isolated vertex Insert/delete an edge Change the

Compute Edit distance for a dataframe which has only column and multiple rows in python

旧城冷巷雨未停 提交于 2019-12-01 07:07:34
问题 I have a dataframe which has one column and more that 2000 rows. How to compute the edit distance between each rows of the same column. My Dataframe looks like this: Name John Mrinmayee rituja ritz divya priyanka chetna chetan mansi mansvi mani aliya shelia Dilip Dilipa I need to calculate distance between each and every row ? How can we do this or achieve this. I have written some code but that doesnot work this .. gives and enndless list of distances I guess I am going wrong in for loop.

How to correct bugs in this Damerau-Levenshtein implementation?

末鹿安然 提交于 2019-11-30 15:58:03
I'm back with another longish question. Having experimented with a number of Python-based Damerau-Levenshtein edit distance implementations, I finally found the one listed below as editdistance_reference() . It seems to deliver correct results and appears to have an efficient implementation. So I set down to convert the code to Cython. on my test data, the reference method manages to deliver results for 11,000 comparisons (for pairs of words aound 12 letters long), while the Cythonized method does over 200,000 comparisons per second. Sadly, the results are incorrect: when you look at the

Optimizing Levenshtein distance algorithm

*爱你&永不变心* 提交于 2019-11-30 07:43:06
I have a stored procedure that uses Levenshtein distance to determine the result closest to what the user typed. The only thing really affecting the speed is the function that calculates the Levenshtein distance for all the records before selecting the record with the lowest distance (I've verified this by putting a 0 in place of the call to the Levenshtein function). The table has 1.5 million records, so even the slightest adjustment may shave off a few seconds. Right now the entire thing runs over 10 minutes. Here's the method I'm using: ALTER function dbo.Levenshtein ( @Source nvarchar(200)

Normalizing the edit distance

大兔子大兔子 提交于 2019-11-30 05:00:47
问题 I have a question that can we normalize the levenshtein edit distance by dividing the e.d value by the length of the two strings? I am asking this because, if we compare two strings of unequal length, the difference between the lengths of the two will be counted as well. for eg: ed('has a', 'has a ball') = 4 and ed('has a', 'has a ball the is round') = 15. if we increase the length of the string, the edit distance will increase even though they are similar. Therefore, I can not set a value,

Change distance between x-axis ticks in ggplot2

心已入冬 提交于 2019-11-29 10:57:23
Right now I am producing a line graph with three observations. Hence, there are three x-axis ticks. I want to manually reduce the distance between the x-axis ticks and basically force the observations to be closer to each other. In other words, I want to reduce the distance between the x-axis ticks. My data: structure(list(Period = c("January 1997 - August 2003", "September 2003 - Jun 2005", "Jul 2005 - Dec 2009", "January 1997 - August 2003", "September 2003 - Jun 2005", "Jul 2005 - Dec 2009"), Time.Period = structure(c(1L, 3L, 2L, 1L, 3L, 2L), .Label = c("Jan 1997 - Aug 2003", "Jul 2005 -