Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery
 
PerlMonks  

Re^2: Match and Extract String with Regex

by erroneousBollock (Curate)
on Nov 17, 2007 at 09:25 UTC ( [id://651391]=note: print w/replies, xml ) Need Help??


in reply to Re: Match and Extract String with Regex
in thread Match and Extract String with Regex

Searching for (Longest Common Subsequence) should reveal nodes on the topic.
... only if they understand how/why Longest Common Subsequence is a generalisation of Longest Common Substring.

While it's an interesting area in which to educate oneself, String::LCSS does exactly what you need.

-David

  • Comment on Re^2: Match and Extract String with Regex

Replies are listed 'Best First'.
Re^3: Match and Extract String with Regex
by lima1 (Curate) on Nov 17, 2007 at 16:58 UTC
    String::LCSS does exactly what you need.

    I don't want to bash String::LCSS, but the implementation seems to be the naive O(n^3) algorithm instead of the O(mn) dynamic programming solution (http://en.wikipedia.org/wiki/Longest_common_substring_problem). A quick and dirty (and not thoroughly tested) implementation is much faster (although probably buggy).

    sub lcss2 { my ($s, $t) = @_; my $z = 0; my $m = length $s; my $n = length $t; my @S = (undef, split(//, $s)); my @T = (undef, split(//, $t)); my @L; my @ret; for my $i ( 1 .. $m ) { for my $j ( 1 .. $n ) { if ($S[$i] eq $T[$j]) { $L[$i-1][$j-1] ||= 0; $L[$i][$j] = $L[$i-1][$j-1] + 1; if ($L[$i][$j] > $z) { $z = $L[$i][$j]; @ret = (); } if ($L[$i][$j] == $z) { push @ret,substr($s, ($i-$z), $z); } } } } # warn Dumper \@L; return join '*', @ret; }
    my $s1 = '6'x 200 . 'zyzxx'; my $s2 = '5'x 200 . 'abczyzefg'; my $count = 1; timethese($count, { 'String::LCSS' => sub { String::LCSS::lcss( $s1, $s2 ) }, 'dynprog' => sub { lcss2( $s1, $s2 )}, });
    Update: Took the opportunity to learn XS and wrote String::LCSS_XS.
      Ah, well spotted.

      I don't suppose you'd like to file a bug? ;)

      -David

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://651391]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others musing on the Monastery: (5)
As of 2024-04-18 02:41 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found