Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

Can KMP be used to find the longest common subsequence?

by Anonymous Monk
on Dec 11, 2007 at 08:11 UTC ( [id://656343]=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hello Perl monks,

Since the KMP algorithm can determine if string S2 is a substring of S1 in O(n) time, can this algorithm be modified to test whether S2 is a subsequence of S1 in linear time?
Thanks.
  • Comment on Can KMP be used to find the longest common subsequence?

Replies are listed 'Best First'.
Re: Can KMP be used to find the longest common subsequence? (no?)
by tye (Sage) on Dec 11, 2007 at 08:36 UTC

    Well, no, not really. Not that it matters, since the naive "obvious" approach is already O(n).

    - tye        

Re: Can KMP be used to find the longest common subsequence?
by Thilosophy (Curate) on Dec 12, 2007 at 07:34 UTC
    Some background context:

    Wikipedia has an explanation of the Knuth-Morris-Pratt algorithm for substring searches.

    The difference between a substring and a subsequence is that a subsequence is not necessarily contiguous (all the characters must be present, and in the correct order, but there can be unrelated characters in between).

      The difference between a substring and a subsequence is that a subsequence is not necessarily contiguous (all the characters must be present, and in the correct order, but there can be unrelated characters in between).
      In that case, I think the easiest way to search for subsequences in Perl is to use regular expressions. For example, if the subsequence is "abcd", then the regular expression to look for a subsequence is /a.*?b.*?c.*?d/.

      And since you can easily contruct a regex out of a string, that is easy toconstruct:

      $string = 'abcd'; $regex = join '.*?', split //, $string; $sequence = 'xxxxxaxxxxxxbxxxxxcxxxxdxxxxx'; if($sequence =~ /$regex/s) { printf "I found a match, at offset %d and with total length %d\n", + $-[0], $+[0]-$-[0]; }
      Result:
      I found a match, at offset 5 and with total length 19

        If that were to fail to match, then it would fail to match very badly, as in O(N**K) badly. Better:

        /^[^a]*?a[^b]*?b[^c]*?c[^d]*?d/

        or use features that I don't use and so don't remember to prevent backtracking. The above version should proceed to match the string linearly and in O(n) and, if that fails, backtrack linearly in O(n) and quickly fail.

        - tye        

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://656343]
Approved by ikegami
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chanting in the Monastery: (2)
As of 2024-04-26 00:57 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found