Longest Common SubSequence Not Working Correctly

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Longest Common SubSequence Not Working Correctly by moritz (Cardinal) on Nov 12, 2007 at 23:23 UTC
It runs forever because you have a loop `while (1){ ... }` in your testing code. The runtime of the subroutine isn't that big. Update: Change the testing part at the bottom to read: `my $a = "abbacbacccabbacbacccccccccccgcccc"; my $b = "aabaabcbavcvcbvbvnbmhgfdgf"; print "LCS: ", lcs($a, $b), "\n\n";` [download] Run time: `real 0m0.043s user 0m0.039s sys 0m0.002s` [download]	[reply] [d/l] [select]
Re: Longest Common SubSequence Not Working Correctly by Anonymous Monk on Nov 13, 2007 at 01:32 UTC
Thanks. I forgot to include the memoize('lcs'); that's why my code took so long to run. I also found this brute force method in Perl Monk and ran a comparison on both method and found that the brute force method actually ran faster. This does not make sense at all. LCSS method: # lcs.pl use strict; use Memoize; sub longerOf { my ($x, $y) = @_; return (length $x > length $y) ? $x : $y; } memoize('lcs'); sub lcs { my ($a, $b) = @_; if ($a eq "" \|\| $b eq ""){ return ""; } my ($az, $bz) = (chop $a, chop $b); if ($az eq $bz){ return lcs($a, $b) . $az; } else { return longerOf( lcs($a . $az, $b), lcs($b . $bz, $a)); } } while (1){ print "1: "; my $a = <>; chomp $a; print "2: "; my $b = <>; chomp $b; $start = time(); print "LCS: ", lcs($a, $b), "\n\n"; $end = time(); print "<br>Time taken was ", ($end - $start), " seconds"; $start = time(); print "Brute Force: ", lcsbruteforce($a, $b), "\n\n"; $end = time(); print "<br>Time taken was ", ($end - $start), " seconds"; } sub lcsbruteforce { my($x, $y) = @_; my(@v, $cx, $cy, $left, $above); for my $xi (0 .. length($x) - 1) { $cx = substr $x, $xi, 1; for my $yi (0 .. length($y) - 1) { $cy = substr $y, $yi, 1; if ($cx eq $cy) { $v[$xi][$yi] = 1 + (($xi && $yi) ? $v[$xi - 1][$yi - 1] : 0); } else { $left = ($xi && $v[$xi - 1][$yi]) \|\| 0; $above = ($xi && $v[$xi][$yi - 1]) \|\| 0; $v[$xi][$yi] = ($left > $above) ? $left : $above; } } } return $v[length($x) - 1][length($y) - 1]; } [download]	[reply] [d/l]
Re^2: Longest Common SubSequence Not Working Correctly by blokhead (Monsignor) on Nov 13, 2007 at 20:01 UTC
Don't let the name of the subroutine fool you. The "brute force" algorithm is not really "brute forcing" the problem. A brute force approach would be to consider every possible subsequence of the strings, taking O(2^min(x,y)) time. In fact, the "brute force" algorithm is doing the same thing as the recursive algorithm (i.e., doing the dynamic programming solution), but iteratively. It uses a standard trick for making a recursive memoizing dynamic programming algorithm iterative. Since the two algorithms solve the problem in essentially the same way, but the iterative one doesn't have the overhead of subroutine calls (which are slow in Perl), it is no surprise that the iterative one is faster. Usually it's easier and more intuitive to write a dynamic programming problem in terms of recursive calls. However, it's necessary to memoize the result of each recursive call, because several other subproblems might use that result of this subproblem in their computation. Now imagine a table that holds all of these memoized results. What happens to this table while the recursive algorithm is running? The table is gradually filling up. How does it fill up? Well, in this case, to compute the value of the subproblem ($a,$b), I need to get the solutiosn for at most these three subproblems: ($a,substr($b,0,-1)), (substr($a,0,-1),b), (substr($a,0,-1),substr($b,0,-1)) In other words, I need to have those 3 cells in the table filled in before I can fill in this cell. So suppose I now do things iteratively instead of recursively, and just concentrate on filling up the table. I'll visit the table's cells in such a way so that I visit the cell ($a,$b) after I visit the three above cells. That way, to fill up the cell ($a,$b), I just check those 3 other cells, do some local comparisons, and I'm done. Finally, the last cell in the table is generally the answer to the "main" subproblem, and I can return that. That's exactly what this "brute force" algorithm is doing. blokhead	[reply]
Re^3: Longest Common SubSequence Not Working Correctly by Anonymous Monk on Nov 14, 2007 at 05:16 UTC
Thanks for the explanation. How would I modify the above brute force code to be truely brute force? Also, the code only print out the length of the sequence but does not print out the characters of the sequence. I tried putting in some print statement in between but it does not seem to work correctly. Any helps?	[reply]
Re^4: Longest Common SubSequence Not Working Correctly by blokhead (Monsignor) on Nov 14, 2007 at 14:58 UTC
Re^3: Longest Common SubSequence Not Working Correctly by Anonymous Monk on Nov 15, 2007 at 00:49 UTC
Thank you Blokhead. I still have lots to learn about Perl. I have another question for you. The sub lcs takes 2 parameters ($a, $b) but inside the loop where the recursive call is made, the program did a call with return lcs($a, $b) . $az; what does the . $az do? I tried printing out the $a, $b but did not see any differences. The new string is one character shorter. When I remove $az from the return, the program produce the wrong result. Thanks again for your help.	[reply]
Re^2: Longest Common SubSequence Not Working Correctly by moritz (Cardinal) on Nov 13, 2007 at 07:54 UTC
I haven't look too close at your code, but it seems the "brute force" approach takes `O(length($x) * length($y))` time. The recursive method is a bit harder to estimate (at least for me), I'll try it anyway. With Memoize it will run in quadratic time as well in the worst case, since `lcs` will be called with all possible pairs of position shifts in the arguments. Memoize will only reduce this runtime if at least one of the strings is of the from `$s x $n` with `$n >= 2`. But it uses many more method calls, which tend to be slow in Perl. To test if this explanation really works you could benchmark both subs with increasingly long strings, and test if their runtime actually evolves similarly.	[reply] [d/l] [select]
Re: Longest Common SubSequence Not Working Correctly by Limbic~Region (Chancellor) on Nov 16, 2007 at 02:27 UTC
Anonymous Monk, If you are looking for code and don't know about CPAN, you should check it out. There is Algorithm::LCS and Algorithm::Diff which both implement a ready-made LCS algorithm. If you need to find the LCS of more than 2 strings, I wrote Longest Common Subsequence. Cheers - L~R	[reply]


No such thing as a small change
	PerlMonks