String::LCSS does exactly what you need.
I don't want to bash String::LCSS, but the implementation seems to be the naive O(n^3) algorithm instead of the O(mn) dynamic programming solution (http://en.wikipedia.org/wiki/Longest_common_substring_problem).
A quick and dirty (and not thoroughly tested) implementation is much faster (although probably buggy).
sub lcss2 {
my ($s, $t) = @_;
my $z = 0;
my $m = length $s;
my $n = length $t;
my @S = (undef, split(//, $s));
my @T = (undef, split(//, $t));
my @L;
my @ret;
for my $i ( 1 .. $m ) {
for my $j ( 1 .. $n ) {
if ($S[$i] eq $T[$j]) {
$L[$i-1][$j-1] ||= 0;
$L[$i][$j] = $L[$i-1][$j-1] + 1;
if ($L[$i][$j] > $z) {
$z = $L[$i][$j];
@ret = ();
}
if ($L[$i][$j] == $z) {
push @ret,substr($s, ($i-$z), $z);
}
}
}
}
# warn Dumper \@L;
return join '*', @ret;
}
my $s1 = '6'x 200 . 'zyzxx';
my $s2 = '5'x 200 . 'abczyzefg';
my $count = 1;
timethese($count, {
'String::LCSS' => sub { String::LCSS::lcss( $s1, $s2 ) },
'dynprog' => sub { lcss2( $s1, $s2 )},
});
Update: Took the opportunity to learn XS and wrote
String::LCSS_XS.
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.