Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

Re: Challenge: Fast Common Substrings

by blokhead (Monsignor)
on Apr 04, 2007 at 02:04 UTC ( [id://608180]=note: print w/replies, xml ) Need Help??


in reply to Challenge: Fast Common Substrings

Here's just a naive solution using regexes. I have no idea if it's fast. On one hand, the regex engine is doing all the work; on the other hand, in the second code snippet, it's just brute-forcing the problem. The match_all_ways sub is taken from Re^3: Delimited Backtracking with Regex.
{ my @matches; my $push = qr/(?{ push @matches, $1 })/; sub match_all_ways { my ($string, $regex) = @_; @matches = (); $string =~ m/($regex)$push(?!)/; return @matches; } } sub common_substr { my ($str1, $str2, $len) = @_; my %substr = map { $_ => 1 } match_all_ways($str1 => qr/.{$len}/); $substr{$_} |= 2 for match_all_ways($str2 => qr/.{$len}/); return grep { $substr{$_} == 3 } keys %substr; } print "$_\n" for common_substr("ABCDEF", "ABDEFCBDEAB", 2); __END__ DE EF AB
Returns all the common substrings, since it might as well.. In scalar context returns the number.

Update: Or, if you want the regex engine to do all of the work, instead of computing the intersection of the two lists in perl:

{ my @matches; my $push = qr/(?{ push @matches, $1 })/; sub match_all_ways { my ($string, $regex) = @_; @matches = (); $string =~ m/$regex$push(?!)/; return @matches; } } sub common_substr { my ($str1, $str2, $len) = @_; my %subs; @subs{ match_all_ways("$str1\0$str2" => qr/(.{$len}).*\0.*\1/) } + = (); return keys %subs; }
Note that the match_all_ways sub was changed slightly (to account for the different capturing). Disclaimer: If input strings contain a newline or null character, or if $len > 65536, it doesn't work.. but I think you get the idea.

These solutions are both trivial to extend to a range of lengths.. just pass "n,m" as the $len argument.

blokhead

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://608180]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others avoiding work at the Monastery: (4)
As of 2024-03-28 21:07 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found