I can do this the proper way using dynamic programming but I can't figure out how to do it in exponential time.
I'm looking to find the largest common sub-sequence between two strings. Note: I mean subsequences and not sub-strings the symbols that make up a sequence need not be consecutive.
Just replace the lookups in the table in your dynamic programming code with recursive calls. In other words, just implement the recursive formulation of the LCS problem:
EDIT
In pseudocode (almost-python, actually):
def lcs(s1, s2):
if len(s1)==0 or len(s2)==0: return 0
if s1[0] == s2[0]: return 1 + lcs(s1[1:], s2[1:])
return max(lcs(s1, s2[1:]), lcs(s1[1:], s2))
Let's say you have two strings a
and b
of length n
. The longest common subsequence is going to be the longest subsequence in string a
that is also present in string b
.
Thus we can iterate through all possible subsequences in a
and see it is in b
.
A high-level pseudocode for this would be:
for i=n to 0
for all length i subsequences s of a
if s is a subsequence of b
return s
String A and String B. A recursive algorithm, maybe it's naive but it is simple:
Look at the first letter of A. This will either be in a common sequence or not. When considering the 'not' option, we trim off the first letter and call recursively. When considering the 'is in a common sequence' option we also trim it off and we also trim off from the start of B up to, and including, the same letter in B. Some pseudocode:
def common_subsequences(A,B, len_subsequence_so_far = 0):
if len(A) == 0 or len(B) == 0:
return
first_of_A = A[0] // the first letter in A.
A1 = A[1:] // A, but with the first letter removed
common_subsequences(A1,B,len_subsequence_so_far) // the first recursive call
if(the_first_letter_of_A_is_also_in_B):
Bn = ... delete from the start of B up to, and including,
... the first letter which equals first_of_A
common_subsequences(A1,Bn, 1+len_subsequence_so_far )
You could start with that and then optimize by remembering the longest subsequence found so far, and then returning early when the current function cannot beat that (i.e. when min(len(A), len(B))+len_subsequence_so_far
is smaller than the longest length found so far.
Essentially if you don't use dynamic programming paradigm - you reach exponential time. This is because, by not storing your partial values - you are recomputing the partial values multiple times.
To achieve exponential time it's enough to generate all subsequences of both arrays and compare each one with each other. If you match two that are identical check if their length is larger then current maximum. The pseudocode would be:
Generate all subsequences of `array1` and `array2`.
for each subsequence of `array1` as s1
for each subsequece of `array2` as s2
if s1 == s2 //check char by char
if len(s1) > currentMax
currentMax = len(s1)
for i = 0; i < 2^2; i++;
It's absolutely not optimal. It doesn't even try. However the question is about the very inefficient algorithm so I've provided one.
int lcs(char[] x, int i, char[] y, int j) {
if (i == 0 || j == 0) return 0;
if (x[i - 1] == y[j - 1]) return lcs(x, i - 1, y, j - 1) + 1;
return Math.max(lcs(x, i, y, j - 1), lcs(x, i - 1, y, j));
}
print(lcs(x, x.length, y, y.length);
Following is a partial recursion tree:
lcs("ABCD", "AFDX")
/ \
lcs("ABC", "AFDX") lcs("ABCD", "AFD")
/ \ / \
lcs("AB", "AFDX") lcs("AXY", "AFD") lcs("ABC", "AFD") lcs("ABCD", "AF")
Worst case is when the length of LCS is 0 which means there's no common subsequence. At that case all of the possible subsequences are examined and there are O(2^n)
subsequences.
来源:https://stackoverflow.com/questions/9010479/how-to-find-the-longest-common-subsequence-in-exponential-time