Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

Re^3: Can KMP be used to find the longest common subsequence? (ugh)

by tye (Sage)
on Dec 12, 2007 at 11:04 UTC ( [id://656587]=note: print w/replies, xml ) Need Help??


in reply to Re^2: Can KMP be used to find the longest common subsequence?
in thread Can KMP be used to find the longest common subsequence?

If that were to fail to match, then it would fail to match very badly, as in O(N**K) badly. Better:

/^[^a]*?a[^b]*?b[^c]*?c[^d]*?d/

or use features that I don't use and so don't remember to prevent backtracking. The above version should proceed to match the string linearly and in O(n) and, if that fails, backtrack linearly in O(n) and quickly fail.

- tye        

  • Comment on Re^3: Can KMP be used to find the longest common subsequence? (ugh)
  • Download Code

Replies are listed 'Best First'.
Re^4: Can KMP be used to find the longest common subsequence? (ugh)
by bart (Canon) on Dec 12, 2007 at 11:46 UTC
    Good point.
    or use features that I don't use and so don't remember to prevent backtracking.
    I do remember, you must be talking about the "cut" assertion: (?>PATTERN). See japhy's drafts for his (unfortunately) abandoned book on regexes, at http://japhy.perlmonk.org/book/book/ , chapter 8 (MS Word format), section 8.1.4.

    Only, it won't work here, because If I did /a(?>.*?)b/, it would never match, because once the .*? ate a "b", it would never give it back.

    print 'xxxxaxxxxbxxxx' =~ /a(?>.*?)b/ ? 'match' : 'fail';
    produces fail.

    So yours is the better solution... you don't need the "?", because /^[^a]*?a/ and /^[^a]*a/ will match the exact same substring. I have no idea which is faster, or whether it even makes a difference, so I'll just leave it in.

    So, my code adapted to produce your pattern:

    $string = 'abcd'; $regex = '^' . join '', map "[^$_]*$_", map quotemeta, split //, $stri +ng; # embed capture to get offset: $regex =~ s/(?<=\*)/(/; $regex =~ s/$/)/; $sequence = 'xxxxxaxxxxxxbxxxxxcxxxxdxxxxx'; if($sequence =~ /$regex/s) { printf "I found a match, at offset %d and with total length %d\n", + $-[1], $+[1]-$-[1]; }
    As a precaution I've included a quotemeta so you can safely search for subsequences containing \W characters.
      Instead of m/a(?>.*?)b/ you can use m/a(?>.*?b)/ (b inside the cut assertion)

      Afaict regex inside the cut assertions can backtrack, but once the engine left the assertion it will not try to match different text with it.

      But the fact remains that you have to know quite some details regular expressions to get that working, so the approach with negated char classes is preferable.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://656587]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others lurking in the Monastery: (5)
As of 2024-04-25 19:59 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found