Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked

Re^3: some efficiency, please

by QM (Parson)
on Apr 12, 2019 at 16:19 UTC ( #1232501=note: print w/replies, xml ) Need Help??

in reply to Re^2: some efficiency, please
in thread some efficiency, please

If the files are very large, you'll spend more time disk swapping than actually reading/writing.

Make 2 passes: Record all of the "ref" numbers you want to delete in the first pass (use a hash), then reread the file, printing it out according to whether a ref value is in the hash.

But to do this well, with multiline data, you'll have to tell us what a "paragraph" is, because it's not clear to me from your description.

It might look something like this:

my %ignore; # First pass while (<FH>) { $ignore{$1} = 1 if some_condition($_); } # Second pass # reset the file to the beginning seek FH, 0, 0; while (<FH>) { if (m/matches interesting string with (capture)/) { if (exists($ignore{$1}) { next; # don't print this line print; }

The trick, of course, is some_condition;

If it's hard to put a single paragraph into a regex, just note the signposts with flags. Something like this for the first pass:

my $in_paragraph; my $bar; my %ignore; while (<FH>) { if (m/start of paragraph/) { $in_paragraph = 1; $bar = 0 next; } if (m/end of paragraph/) { $in_paragraph = 0; next; } if (m/line with bar/) { $bar = 1; next; } if (m/line with ref (\d+)/) { if ($begin and $bar and not $end) { $ignore{$1} = 1; } next; } }

And then something very similar to that in the 2nd pass, except printing or not printing based on your logic. (If you were very clever, you could reuse that code, with a tweak, passing a parameter for the pass number. But don't get clever until it works.)

Quantum Mechanics: The dreams stuff is made of

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1232501]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others cooling their heels in the Monastery: (8)
As of 2021-01-27 23:07 GMT
Find Nodes?
    Voting Booth?