Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

Re: some efficiency, please

by tybalt89 (Monsignor)
on Apr 12, 2019 at 17:05 UTC ( [id://1232503]=note: print w/replies, xml ) Need Help??


in reply to some efficiency, please

Try this. It passes your test case :)

#!/usr/bin/perl # https://perlmonks.org/?node_id=1232492 use strict; use warnings; $_ = do { local $/; <DATA> }; my @del =map /^(?=(\d+)\n)(?=.*^bar\n)/ms, /^begin\n(.*?)^end\n/gms; my $pattern = do { local $" = '|'; qr/^\s+(foo )?ref (@del)\n/mn }; s/$pattern//gm; print; __DATA__ ref 1 ref 2 ref 3 ref 4 begin 1 end begin 2 bar end begin 3 end begin 4 bar end ref 5 foo ref 6 ref 7 begin 5 end begin 6 bar end begin 7 bar end

Replies are listed 'Best First'.
Re^2: some efficiency, please
by haukex (Archbishop) on Apr 13, 2019 at 07:58 UTC

    Note that this also removes "ref N" when they appear inside one of the begin...end blocks; the spec is unclear whether that is desired or not. Update: The same goes for AnomalousMonk's solution. Some more test cases from the OP would be helpful here :-)

      <simplistic_answer> It passes *all* the test cases, therefor it's correct. </simplistic_answer>

      hehehe

Re^2: some efficiency, please
by Anonymous Monk on Apr 13, 2019 at 02:34 UTC
    Thanks. There are just so many tricks I never thought of (like building the entire array into the qr statement). :)

    Would have liked to have used map, but the actual data is more like:

    ref 1 ref 2 1 begin end 2 begin bar end ...

    with the numbers outside (right before) the beginning of each paragraph. I am sure that can be done with map, but it is a little too tricky for me at my level. :)

    I killed what I had before after thirteen hours of CPU time, but I guess it (eventually) would have finished. I started it again with a print statement right before it removes each line from the file, and it starts out very quickly (just a few seconds per line removed), and then just keeps slowing down (after about a half-hour, it was well over a minute per line removed).

    Still don't understand why just changing the one line:

    for (@objects) { $allfile =~ s/^ +$_\n//m; }

    to:

    for (@objects) { $allfile =~ s/^ *(foo)? +$_\n//mn; }

    caused it to slow down SO much.

    Anyway, I was satisfied with the performance I had before (without the test for foo), but your method is an order of magnitude faster than that "without the test" method, and it (of course) catches the rare case when foo is there, so I am extremely grateful for that. Thanks, again.

      Still don't understand why just changing the one line... caused it to slow down SO much.
      Just off the top of my head, I suspect it may be a bad interaction between the " *" and the " +" causing the regex engine to backtrack excessively because they're separated only by an optional element, so a string of multiple spaces can match in multiple ways (no spaces/5 spaces, 1 space/4 spaces, etc.) which then multiplies the number of potential matches for the full string, each of which needs to be evaluated until the engine is satisfied that it either found one that's good enough or that no match exists. The capturing parens on foo may also be contributing.

      If you want to test this theory, you could try changing the regex to $allfile =~ s/^( *foo)? +$_\n//mn; (leaving the capturing parens intact) or $allfile =~ s/^(?: *foo)? +$_\n//mn; (non-capturing parens, since you're really only using them for grouping) and seeing if that restores the original performance.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1232503]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others contemplating the Monastery: (3)
As of 2024-04-25 19:20 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found