Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

Re^3: some efficiency, please

by QM (Parson)
on Apr 12, 2019 at 16:19 UTC ( [id://1232501]=note: print w/replies, xml ) Need Help??


in reply to Re^2: some efficiency, please
in thread some efficiency, please

If the files are very large, you'll spend more time disk swapping than actually reading/writing.

Make 2 passes: Record all of the "ref" numbers you want to delete in the first pass (use a hash), then reread the file, printing it out according to whether a ref value is in the hash.

But to do this well, with multiline data, you'll have to tell us what a "paragraph" is, because it's not clear to me from your description.

It might look something like this:

my %ignore; # First pass while (<FH>) { $ignore{$1} = 1 if some_condition($_); } # Second pass # reset the file to the beginning seek FH, 0, 0; while (<FH>) { if (m/matches interesting string with (capture)/) { if (exists($ignore{$1}) { next; # don't print this line print; }

The trick, of course, is some_condition;

If it's hard to put a single paragraph into a regex, just note the signposts with flags. Something like this for the first pass:

my $in_paragraph; my $bar; my %ignore; while (<FH>) { if (m/start of paragraph/) { $in_paragraph = 1; $bar = 0 next; } if (m/end of paragraph/) { $in_paragraph = 0; next; } if (m/line with bar/) { $bar = 1; next; } if (m/line with ref (\d+)/) { if ($begin and $bar and not $end) { $ignore{$1} = 1; } next; } }

And then something very similar to that in the 2nd pass, except printing or not printing based on your logic. (If you were very clever, you could reuse that code, with a tweak, passing a parameter for the pass number. But don't get clever until it works.)

-QM
--
Quantum Mechanics: The dreams stuff is made of

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1232501]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chilling in the Monastery: (6)
As of 2024-04-18 11:56 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found