Re^3: some efficiency, please

If the files are very large, you'll spend more time disk swapping than actually reading/writing.

Make 2 passes: Record all of the "ref" numbers you want to delete in the first pass (use a hash), then reread the file, printing it out according to whether a ref value is in the hash.

But to do this well, with multiline data, you'll have to tell us what a "paragraph" is, because it's not clear to me from your description.

It might look something like this:

my %ignore;

# First pass
while (<FH>) {
   $ignore{$1} = 1 if some_condition($_);
}

# Second pass
# reset the file to the beginning
seek FH, 0, 0;
while (<FH>) {
   if (m/matches interesting string with (capture)/) {
       if (exists($ignore{$1}) {
           next; # don't print this line
   print;
}
[download]

The trick, of course, is some_condition;

If it's hard to put a single paragraph into a regex, just note the signposts with flags. Something like this for the first pass:

my $in_paragraph;
my $bar;
my %ignore;
while (<FH>) {
   if (m/start of paragraph/) {
       $in_paragraph = 1;
       $bar = 0
       next;
   }
   if (m/end of paragraph/) {
       $in_paragraph = 0;
       next;
   }
   if (m/line with bar/) {
       $bar = 1;
       next;
   }
   if (m/line with ref (\d+)/) {
       if ($begin and $bar and not $end) {
           $ignore{$1} = 1;
       }
       next;
   }
}
[download]

And then something very similar to that in the 2nd pass, except printing or not printing based on your logic. (If you were very clever, you could reuse that code, with a tweak, passing a parameter for the pass number. But don't get clever until it works.)

-QM
--
Quantum Mechanics: The dreams stuff is made of

Comment on Re^3: some efficiency, please Select or Download Code


Perl-Sensitive Sunglasses
	PerlMonks