Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??

Takes several minutes on what data? I've been improving myne too. Here's the latest. It has three improvements over the completely dumb version. If there are some characters that are not common to all strings, it makes a character class to try only the common things. When skipping forward to find a backreference it uses a hack that works around not being able to use back references in character classes. After matching a backreference it makes an assertion about how many characters remain in the current line. I'm not sure if they have much effect on the success part of the match but I'm hoping they turn out to make the failure part faster. I also commented it. ;-)

[Updated: Here's the final version I ended up with. It has two optional parts each of which contribute a bit of overhead but might allow some branches to be ignored. It also has timing data.]

$_ = <<"..."; CPD6Z98SB2KQNWV0F7Y1IX4GLRA5MTOJHE3U CXZOL6SUI2WTJ30HF519YPGBRNAK48MQVD7E T8COSQU6I2FJN40DKL157WVGPYXARZ3MBHE9 KNCWVZDSR5420LP91FIQGB7Y3A6J8MOUXTEH XF9C4PSDY62TWJ0QBN17IKG3OH8ALVRM5UEZ D9QCHUSN7TW2YZL0O831FGXIR6JA4P5MVBKE ZC7ISQUPK6N20OLV4T31G9FRXBAWM5YJHED8 Z3C7SJVODL25TRQ01HPWGNKXB4UA68YMI9EF BC9OXDHS2FI5Z6U0TYL1VPGQK7ANR38MEWJ4 K4TCQBHS2ZV7FXU0P8R1YGDON3A6JILM9EW5 ... # $_ = <<"..."; # HOUSEBOAT # COMPUTER # DOUBT # ... use Benchmark; use constant { BACKREF_HACK => 1, LENGTH_HACK => 1 }; $| = 1; my $charclass = common_charclass(/([^\n]+)/); # use re 'debug'; # rx(8,$charclass); # exit; for ( my $len = 1;; ++$len ) { my $rx = rx( $len, $charclass ); my $t0 = Benchmark->new; my @found = /$rx/; my $t1 = Benchmark->new; print timestr( timediff( $t1, $t0 ) ) . " "; if (@found) { print join( '', @found ) . "\n"; } else { print "\n"; last; } } sub uniq { my %seen; return grep !$seen{$_}++, @_; } sub max { my $max = shift @_; $max < $_ and $max = $_ for @_; return $max; } sub common_charclass { my %seen; ++$seen{$_} for map @$_, map [ uniq( split // ) ], @_; my $max = max( values %seen ); my @no = delete @seen{ grep $max != $seen{$_}, keys %seen } or return undef; return do { my ($solo) = keys %seen; $solo } if 1 == scalar keys % +seen; return '[' . join( '', sort keys %seen ) . ']'; } sub rx { # 0 wallclock secs ( 0.00 usr + 0.00 sys = 0.00 CPU) C # 0 wallclock secs ( 0.00 usr + 0.00 sys = 0.00 CPU) CP # 0 wallclock secs ( 0.00 usr + 0.00 sys = 0.00 CPU) CPM # 0 wallclock secs ( 0.01 usr + 0.00 sys = 0.01 CPU) CPME # 1 wallclock secs ( 0.42 usr + 0.00 sys = 0.42 CPU) CS201 # 3 wallclock secs ( 2.93 usr + 0.00 sys = 2.93 CPU) CS201G # 18 wallclock secs (16.28 usr + 0.00 sys = 16.28 CPU) CS201GA # 78 wallclock secs (72.17 usr + 0.02 sys = 72.19 CPU) CS201GAM # 277 wallclock secs (263.39 usr + 0.08 sys = 263.47 CPU) CS201GA +ME # 3466 wallclock secs (3260.91 usr + 0.98 sys = 3261.89 CPU) # # real 64m3.333s # user 60m16.157s # sys 0m1.088s my ( $len, $char ) = @_; $char = '\S' if not defined $char; my $pat = "\\A.*?" . ( "($char).*?" x $len ) . "\n"; # Start a new line $pat .= "(?>(?:"; # Make all my assertions about the content of the line $pat .= join '', map { my $capture_num = $_; # Skip past stuff that doesn't match $$_ my $seek = BACKREF_HACK ? "(?>(?:(?!\\$_).)*)" : ".*?"; # Find $$_ my $found_it = "\\$_"; # If I'm too close to the end of the line and don't have # enough characters to match, I'll assert that I need that # many and bail if there aren't enough. my $enough_left_at_end; if ( LENGTH_HACK ) { if ( $_ < $len ) { my $must_be_at_least_this_long = $len - $_; $enough_left_at_end = "(?=.{$must_be_at_least_this_lon +g})"; } else { $enough_left_at_end = ""; } } else { $enough_left_at_end = ''; } # Match *this* "$seek$found_it$enough_left_at_end"; } 1 .. $len; # Skip to the end of the line. $pat .= "(?>.*)\n"; # Repeat lines til I get to the end $pat .= ")+)\\z"; return qr/(?-s)$pat/; }

⠤⠤ ⠙⠊⠕⠞⠁⠇⠑⠧⠊


In reply to Re^2: Longest Common Subsequence by diotalevi
in thread Longest Common Subsequence by Limbic~Region

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others avoiding work at the Monastery: (6)
As of 2024-04-25 15:44 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found