Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

Re: Longest Common Subsequence

by Limbic~Region (Chancellor)
on May 12, 2006 at 13:17 UTC ( [id://548978]=note: print w/replies, xml ) Need Help??


in reply to Longest Common Subsequence

All,
I am happily suprised with how efficient my attempt is. I whipped up a brute-force implementation and I am not sure when it will ever finish.
#!/usr/bin/perl use strict; use warnings; my @str = map {$_->[0]} sort {$a->[1] <=> $b->[1]} map {chomp; [$_, length($_)]} <DATA>; print LCS(@str), "\n"; sub LCS { my @str = @_; my $next = combo(split //, $str[0]); SUBSEQ: while (my @subseq = $next->()) { my $regex = join '.*', @subseq; $regex = qr/$regex/; for (1 .. $#str) { next SUBSEQ if $str[$_] !~ /$regex/; } return join '', @subseq; } } sub combo { # modified [id://394168] to generate long to short # - my ($by, $next) = (0, 1); # + my ($by, $next) = ($#list + 2, 1); # - $by++; # - return () if $by > @list; # + $by--; # +return () if ! $by; }
The pure regex brute-force solution by diotalevi is much better than this but still takes several minutes. Wow.

Cheers - L~R

Replies are listed 'Best First'.
Re^2: Longest Common Subsequence
by diotalevi (Canon) on May 12, 2006 at 14:28 UTC

    Takes several minutes on what data? I've been improving myne too. Here's the latest. It has three improvements over the completely dumb version. If there are some characters that are not common to all strings, it makes a character class to try only the common things. When skipping forward to find a backreference it uses a hack that works around not being able to use back references in character classes. After matching a backreference it makes an assertion about how many characters remain in the current line. I'm not sure if they have much effect on the success part of the match but I'm hoping they turn out to make the failure part faster. I also commented it. ;-)

    [Updated: Here's the final version I ended up with. It has two optional parts each of which contribute a bit of overhead but might allow some branches to be ignored. It also has timing data.]

    $_ = <<"..."; CPD6Z98SB2KQNWV0F7Y1IX4GLRA5MTOJHE3U CXZOL6SUI2WTJ30HF519YPGBRNAK48MQVD7E T8COSQU6I2FJN40DKL157WVGPYXARZ3MBHE9 KNCWVZDSR5420LP91FIQGB7Y3A6J8MOUXTEH XF9C4PSDY62TWJ0QBN17IKG3OH8ALVRM5UEZ D9QCHUSN7TW2YZL0O831FGXIR6JA4P5MVBKE ZC7ISQUPK6N20OLV4T31G9FRXBAWM5YJHED8 Z3C7SJVODL25TRQ01HPWGNKXB4UA68YMI9EF BC9OXDHS2FI5Z6U0TYL1VPGQK7ANR38MEWJ4 K4TCQBHS2ZV7FXU0P8R1YGDON3A6JILM9EW5 ... # $_ = <<"..."; # HOUSEBOAT # COMPUTER # DOUBT # ... use Benchmark; use constant { BACKREF_HACK => 1, LENGTH_HACK => 1 }; $| = 1; my $charclass = common_charclass(/([^\n]+)/); # use re 'debug'; # rx(8,$charclass); # exit; for ( my $len = 1;; ++$len ) { my $rx = rx( $len, $charclass ); my $t0 = Benchmark->new; my @found = /$rx/; my $t1 = Benchmark->new; print timestr( timediff( $t1, $t0 ) ) . " "; if (@found) { print join( '', @found ) . "\n"; } else { print "\n"; last; } } sub uniq { my %seen; return grep !$seen{$_}++, @_; } sub max { my $max = shift @_; $max < $_ and $max = $_ for @_; return $max; } sub common_charclass { my %seen; ++$seen{$_} for map @$_, map [ uniq( split // ) ], @_; my $max = max( values %seen ); my @no = delete @seen{ grep $max != $seen{$_}, keys %seen } or return undef; return do { my ($solo) = keys %seen; $solo } if 1 == scalar keys % +seen; return '[' . join( '', sort keys %seen ) . ']'; } sub rx { # 0 wallclock secs ( 0.00 usr + 0.00 sys = 0.00 CPU) C # 0 wallclock secs ( 0.00 usr + 0.00 sys = 0.00 CPU) CP # 0 wallclock secs ( 0.00 usr + 0.00 sys = 0.00 CPU) CPM # 0 wallclock secs ( 0.01 usr + 0.00 sys = 0.01 CPU) CPME # 1 wallclock secs ( 0.42 usr + 0.00 sys = 0.42 CPU) CS201 # 3 wallclock secs ( 2.93 usr + 0.00 sys = 2.93 CPU) CS201G # 18 wallclock secs (16.28 usr + 0.00 sys = 16.28 CPU) CS201GA # 78 wallclock secs (72.17 usr + 0.02 sys = 72.19 CPU) CS201GAM # 277 wallclock secs (263.39 usr + 0.08 sys = 263.47 CPU) CS201GA +ME # 3466 wallclock secs (3260.91 usr + 0.98 sys = 3261.89 CPU) # # real 64m3.333s # user 60m16.157s # sys 0m1.088s my ( $len, $char ) = @_; $char = '\S' if not defined $char; my $pat = "\\A.*?" . ( "($char).*?" x $len ) . "\n"; # Start a new line $pat .= "(?>(?:"; # Make all my assertions about the content of the line $pat .= join '', map { my $capture_num = $_; # Skip past stuff that doesn't match $$_ my $seek = BACKREF_HACK ? "(?>(?:(?!\\$_).)*)" : ".*?"; # Find $$_ my $found_it = "\\$_"; # If I'm too close to the end of the line and don't have # enough characters to match, I'll assert that I need that # many and bail if there aren't enough. my $enough_left_at_end; if ( LENGTH_HACK ) { if ( $_ < $len ) { my $must_be_at_least_this_long = $len - $_; $enough_left_at_end = "(?=.{$must_be_at_least_this_lon +g})"; } else { $enough_left_at_end = ""; } } else { $enough_left_at_end = ''; } # Match *this* "$seek$found_it$enough_left_at_end"; } 1 .. $len; # Skip to the end of the line. $pat .= "(?>.*)\n"; # Repeat lines til I get to the end $pat .= ")+)\\z"; return qr/(?-s)$pat/; }

    ⠤⠤ ⠙⠊⠕⠞⠁⠇⠑⠧⠊

      diotalevi,
      Takes several minutes on what data?

      I was referring to your CB comments on how long it took to do the data presented in the puzzle. Your regex brute-force solution is extremely fast in comparison to my brute-force approach. Mine involves generating all possible subsequence of the shortest string in descending order according to length and testing each one on the remaining strings.

      I've been improving myne too.

      I am not sure if you meant that I was improving mine and you were improving yours too or if you meant that my characterization of yours was based off an old version. If the former, I haven't been improving my own. I am completely amazed at how fast it runs. I was only building a brute-force approach to verify that it was actually correct.

      I am still hoping someone more knowledgeable than myself can answer some or all of the questions I posed at the end of the meditation.

      Cheers - L~R

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://548978]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having a coffee break in the Monastery: (7)
As of 2024-04-25 08:23 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found