So I do (think I) need to pass through the file twice - once to find the references I want to remove, and once to actually remove them.
I took that as a challenge ;-) This only needs a single pass by reversing both the input and output by piping it through tac, and produces your desired output:
use warnings;
use strict;
die "Usage: $0 INFILE\n" unless @ARGV==1;
my $INFILE = shift @ARGV;
open my $ofh, '|-', 'tac' or die "tac (out): $!";
open my $ifh, '-|', 'tac', $INFILE or die "tac $INFILE: $!";
my ($aminblock,$prevnum,$foundstr);
my %found;
while (<$ifh>) {
chomp;
my $out=1;
if (!$aminblock) {
if (/^end$/) { undef $foundstr; $aminblock=1 }
elsif (/^\s*(?:foo\s+)?ref\s+(\d+)\s*$/) {
die "ref $1 without block?" unless exists $found{$1};
$out = !$found{$1};
}
else { die "unexpected outside of a block: $_" }
}
else {
if (/^\s*(\d+)\s*$/) { $prevnum=$1 }
elsif (/^begin$/) {
die "block ended without number?" unless defined $prevnum;
$found{$prevnum} = $foundstr;
undef $prevnum;
$aminblock=0;
}
else {
undef $prevnum;
if (/bar/) { $foundstr=1 }
}
}
print {$ofh} $_, "\n" if $out;
}
close $ifh or die "tac $INFILE: ".($!||"\$?=$?");
close $ofh or die "tac (out): ".($!||"\$?=$?");
Although the two passes through tac might actually make that less efficient for large files. Here's a two-pass version:
use warnings;
use strict;
die "Usage: $0 INFILE\n" unless @ARGV==1;
my $INFILE = shift @ARGV;
use constant { STATE_IDLE=>0, STATE_BEGIN=>1, STATE_INBLOCK=>2 };
open my $fh, '<', $INFILE or die "$INFILE: $!";
my %found;
my $state = STATE_IDLE;
my $curnum;
for my $pass (1..2) {
while (<$fh>) {
chomp;
my $out = 1;
if ($state==STATE_IDLE) {
if (/^\s*(?:foo\s+)?ref\s+(\d+)\s*$/) { $out=!$found{$1} }
elsif (/^begin$/) { $state=STATE_BEGIN }
else { die "unexpected in state $state: $_" }
}
elsif ($state==STATE_BEGIN) {
if (/^\s*(\d+)\s*$/) { $curnum=$1; $state=STATE_INBLOCK }
else { die "unexpected in state $state: $_" }
}
elsif ($state==STATE_INBLOCK) {
if (/^end$/) { $state=STATE_IDLE }
elsif (/bar/) { $found{$curnum}=1 }
}
else { die "bad state $state" }
print $_, "\n" if $pass==2 && $out;
}
die "unexpected state at eof: $state" unless $state==STATE_IDLE;
seek $fh, 0, 0 or die "seek $INFILE: $!";
}
close $fh;
Update: Note that these solutions don't remove ref N lines if they appear inside begin...end blocks; this was an assumption I made, but it's actually unclear what the desired behavior is in that case? |