lcs

Myers diff algorithm vs Hunt–McIlroy algorithm

◇◆丶佛笑我妖孽 提交于 2019-11-26 20:43:58
问题 The longest common subsequence problem is a classic computer science problem, algorithms to solve it are the root of version control systems and wiki engines. Two basic algorithms are the Hunt–McIlroy algorithm which was used to create the original version of diff , and the Myers diff algorithm which is used by the GNU diff utility currently. Both seem to work more or less by finding the shortest path through a graph that represents the edit space between the two strings or text files. The

longest common substring in R finding non-contiguous matches between the two strings

蹲街弑〆低调 提交于 2019-11-26 18:21:53
问题 I have a question regarding finding the longest common substring in R. While searching through a few posts on StackOverflow, I got to know about the qualV package. However, I see that the LCS function in this package actually finds all characters from string1 which are present in string2, even if they are not contiguous. To explain, if the strings are string1 : " hel lo" string2 : " hel 12345lo" I expect the output to be hel , however I get the output as hello. I must be doing something wrong

对最长公共子序列(LCS)等一系列DP问题的研究

北战南征 提交于 2019-11-26 17:59:49
LIS问题: 设 \(f[i]\) 为以 \(a[i]\) 结尾的最长上升子序列长度,有: \[f[i]=f[j]+1(j<i&&a[j]<a[i])\] 可以用树状数组优化至 \(O(nlogn)\) 基于排列的LCS问题( \(a,b\) 均为排列,即一个元素不会出现多次): 设 \(pos_i\) 为 \(a_i\) 在 \(b\) 中出现的位置,即 \(a_i=b_pos_i\) 。 \(a\) 的一个子序列 \(a_p_1,a_p_2,...,a_p_m\) 是 \(a,b\) 的公共子序列等价于 \(pos_p_1<pos_p_2<...<pos_p_m\) 求一个LIS即可。 一般LCS问题: 经典解法: 设 \(f[i][j]\) 表示只考虑 \(a\) 中前 \(i\) 个, \(b\) 中前 \(j\) 个的最长公共子序列长度,有: \[f[i][j]=\left\{ \begin{aligned} & f[i-1][j-1] & a[i]=b[j]\\ & max(f[i-1][j],f[i][j-1]) & a[i]!=b[j]\\ \end{aligned} \right.\] 十分简单,但是还有一种稍微复杂但是拓展性更高的做法: 设$f[i][j]$表示只考虑$a$中前$i$个,$b$中前$j$个并且$b_j$已经和$a_1,...,a_i

Find common substrings between two character variables

倾然丶 夕夏残阳落幕 提交于 2019-11-26 17:53:08
I have two character variables (names of objects) and I want to extract the largest common substring. a <- c('blahABCfoo', 'blahDEFfoo') b <- c('XXABC-123', 'XXDEF-123') I want the following as a result: [1] "ABC" "DEF" These vectors as input should give the same result: a <- c('textABCxx', 'textDEFxx') b <- c('zzABCblah', 'zzDEFblah') These examples are representative. The strings contain identifying elements, and the remainder of the text in each vector element is common, but unknown. Is there a solution, in one of the following places (in order of preference): Base R Recommended Packages

最长公共子序列(LCS)

半城伤御伤魂 提交于 2019-11-26 04:17:39
最长公共子序列(LCS) LCS是Longest Common Subsequence的缩写,即最长公共子序列。一个序列,如果是两个或多个已知序列的子序列,且是所有子序列中最长的,则为最长公共子序列。比如,对于char x[]=“aabcd”;有顺序且相互相邻的aabc是其子序列,有顺序但是不相邻的abc也是其公共子序列。即,只要得出序列中各个元素属于所给出的数列,就是子序列。再加上char y[]=“12abcabcd”;对比出才可以得出最长公共子序列abcd。 代码: # include <bits/stdc++.h> using namespace std ; const int maxm = 1e3 + 5 ; int dp [ maxm ] [ maxm ] ; int main ( ) { string s1 , s2 ; cin >> s1 >> s2 ; for ( int i = 1 ; i <= s1 . length ( ) ; i ++ ) { for ( int j = 1 ; j <= s2 . length ( ) ; j ++ ) { if ( s1 [ i - 1 ] == s2 [ j - 1 ] ) dp [ i ] [ j ] = dp [ i - 1 ] [ j - 1 ] + 1 ; dp [ i ] [ j ] = max ( dp [