Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

Longest Common SubSequence Not Working Correctly

by Anonymous Monk
on Nov 12, 2007 at 23:08 UTC ( [id://650407]=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hello Perl Monks,

I found the following code at
http://www.everything2.com/?node_id=1220896
# lcs.pl use strict; use Memoize; sub longerOf { my ($x, $y) = @_; return (length $x > length $y) ? $x : $y; } memoize('lcs'); sub lcs { my ($a, $b) = @_; if ($a eq "" || $b eq ""){ return ""; } my ($az, $bz) = (chop $a, chop $b); if ($az eq $bz){ return lcs($a, $b) . $az; } else { return longerOf( lcs($a . $az, $b), lcs($b . $bz, $a)); } } while (1){ print "1: "; my $a = <>; chomp $a; print "2: "; my $b = <>; chomp $b; print "LCS: ", lcs($a, $b), "\n\n"; }
It seemed to work when the string and pattern were short (4 or 5 characters) but when I increased the length of the string and pattern, it ran forever. The string I used was:
String: abbacbacccabbacbacccccccccccgcccc
pattern: aabaabcbavcvcbvbvnbmhgfdgf
I looked over the code and it looked good. Any thoughts on why it is not working?
Thanks.

Replies are listed 'Best First'.
Re: Longest Common SubSequence Not Working Correctly
by moritz (Cardinal) on Nov 12, 2007 at 23:23 UTC
    It runs forever because you have a loop while (1){ ... } in your testing code.

    The runtime of the subroutine isn't that big.

    Update:

    Change the testing part at the bottom to read:

    my $a = "abbacbacccabbacbacccccccccccgcccc"; my $b = "aabaabcbavcvcbvbvnbmhgfdgf"; print "LCS: ", lcs($a, $b), "\n\n";
    Run time:
    real 0m0.043s user 0m0.039s sys 0m0.002s
Re: Longest Common SubSequence Not Working Correctly
by Anonymous Monk on Nov 13, 2007 at 01:32 UTC
    Thanks. I forgot to include the

    memoize('lcs');

    that's why my code took so long to run. I also found this brute force method in Perl Monk and ran a comparison on both method and found that the brute force method actually ran faster. This does not make sense at all.
    LCSS method:
    # lcs.pl use strict; use Memoize; sub longerOf { my ($x, $y) = @_; return (length $x > length $y) ? $x : $y; } memoize('lcs'); sub lcs { my ($a, $b) = @_; if ($a eq "" || $b eq ""){ return ""; } my ($az, $bz) = (chop $a, chop $b); if ($az eq $bz){ return lcs($a, $b) . $az; } else { return longerOf( lcs($a . $az, $b), lcs($b . $bz, $a)); } } while (1){ print "1: "; my $a = <>; chomp $a; print "2: "; my $b = <>; chomp $b; $start = time(); print "LCS: ", lcs($a, $b), "\n\n"; $end = time(); print "<br>Time taken was ", ($end - $start), " seconds"; $start = time(); print "Brute Force: ", lcsbruteforce($a, $b), "\n\n"; $end = time(); print "<br>Time taken was ", ($end - $start), " seconds"; } sub lcsbruteforce { my($x, $y) = @_; my(@v, $cx, $cy, $left, $above); for my $xi (0 .. length($x) - 1) { $cx = substr $x, $xi, 1; for my $yi (0 .. length($y) - 1) { $cy = substr $y, $yi, 1; if ($cx eq $cy) { $v[$xi][$yi] = 1 + (($xi && $yi) ? $v[$xi - 1][$yi - 1] : 0); } else { $left = ($xi && $v[$xi - 1][$yi]) || 0; $above = ($xi && $v[$xi][$yi - 1]) || 0; $v[$xi][$yi] = ($left > $above) ? $left : $above; } } } return $v[length($x) - 1][length($y) - 1]; }
      Don't let the name of the subroutine fool you. The "brute force" algorithm is not really "brute forcing" the problem. A brute force approach would be to consider every possible subsequence of the strings, taking O(2min(x,y)) time.

      In fact, the "brute force" algorithm is doing the same thing as the recursive algorithm (i.e., doing the dynamic programming solution), but iteratively. It uses a standard trick for making a recursive memoizing dynamic programming algorithm iterative. Since the two algorithms solve the problem in essentially the same way, but the iterative one doesn't have the overhead of subroutine calls (which are slow in Perl), it is no surprise that the iterative one is faster.

      Usually it's easier and more intuitive to write a dynamic programming problem in terms of recursive calls. However, it's necessary to memoize the result of each recursive call, because several other subproblems might use that result of this subproblem in their computation.

      Now imagine a table that holds all of these memoized results. What happens to this table while the recursive algorithm is running? The table is gradually filling up. How does it fill up? Well, in this case, to compute the value of the subproblem ($a,$b), I need to get the solutiosn for at most these three subproblems:

      ($a,substr($b,0,-1)), (substr($a,0,-1),b), (substr($a,0,-1),substr($b,0,-1))
      In other words, I need to have those 3 cells in the table filled in before I can fill in this cell.

      So suppose I now do things iteratively instead of recursively, and just concentrate on filling up the table. I'll visit the table's cells in such a way so that I visit the cell ($a,$b) after I visit the three above cells. That way, to fill up the cell ($a,$b), I just check those 3 other cells, do some local comparisons, and I'm done. Finally, the last cell in the table is generally the answer to the "main" subproblem, and I can return that. That's exactly what this "brute force" algorithm is doing.

      blokhead

        Thanks for the explanation. How would I modify the above brute force code to be truely brute force? Also, the code only print out the length of the sequence but does not print out the characters of the sequence. I tried putting in some print statement in between but it does not seem to work correctly. Any helps?
        Thank you Blokhead. I still have lots to learn about Perl. I have another question for you. The sub lcs takes 2 parameters ($a, $b) but inside the loop where the recursive call is made, the program did a call with

        return lcs($a, $b) . $az;

        what does the . $az do? I tried printing out the $a, $b but did not see any differences. The new string is one character shorter. When I remove $az from the return, the program produce the wrong result. Thanks again for your help.
      I haven't look too close at your code, but it seems the "brute force" approach takes O(length($x) * length($y)) time.

      The recursive method is a bit harder to estimate (at least for me), I'll try it anyway. With Memoize it will run in quadratic time as well in the worst case, since lcs will be called with all possible pairs of position shifts in the arguments. Memoize will only reduce this runtime if at least one of the strings is of the from $s x $n with $n >= 2.

      But it uses many more method calls, which tend to be slow in Perl.

      To test if this explanation really works you could benchmark both subs with increasingly long strings, and test if their runtime actually evolves similarly.

Re: Longest Common SubSequence Not Working Correctly
by Limbic~Region (Chancellor) on Nov 16, 2007 at 02:27 UTC

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://650407]
Approved by ikegami
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others meditating upon the Monastery: (3)
As of 2024-04-18 19:24 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found