So I do (think I) need to pass through the file twice - once to find the references I want to remove, and once to actually remove them.
I took that as a challenge ;-) This only needs a single pass by reversing both the input and output by piping it through tac, and produces your desired output:
use warnings;
use strict;
die "Usage: $0 INFILE\n" unless @ARGV==1;
my $INFILE = shift @ARGV;
open my $ofh, '|-', 'tac' or die "tac (out): $!";
open my $ifh, '-|', 'tac', $INFILE or die "tac $INFILE: $!";
my ($aminblock,$prevnum,$foundstr);
my %found;
while (<$ifh>) {
chomp;
my $out=1;
if (!$aminblock) {
if (/^end$/) { undef $foundstr; $aminblock=1 }
elsif (/^\s*(?:foo\s+)?ref\s+(\d+)\s*$/) {
die "ref $1 without block?" unless exists $found{$1};
$out = !$found{$1};
}
else { die "unexpected outside of a block: $_" }
}
else {
if (/^\s*(\d+)\s*$/) { $prevnum=$1 }
elsif (/^begin$/) {
die "block ended without number?" unless defined $prevnum;
$found{$prevnum} = $foundstr;
undef $prevnum;
$aminblock=0;
}
else {
undef $prevnum;
if (/bar/) { $foundstr=1 }
}
}
print {$ofh} $_, "\n" if $out;
}
close $ifh or die "tac $INFILE: ".($!||"\$?=$?");
close $ofh or die "tac (out): ".($!||"\$?=$?");
Although the two passes through tac might actually make that less efficient for large files. Here's a two-pass version:
use warnings;
use strict;
die "Usage: $0 INFILE\n" unless @ARGV==1;
my $INFILE = shift @ARGV;
use constant { STATE_IDLE=>0, STATE_BEGIN=>1, STATE_INBLOCK=>2 };
open my $fh, '<', $INFILE or die "$INFILE: $!";
my %found;
my $state = STATE_IDLE;
my $curnum;
for my $pass (1..2) {
while (<$fh>) {
chomp;
my $out = 1;
if ($state==STATE_IDLE) {
if (/^\s*(?:foo\s+)?ref\s+(\d+)\s*$/) { $out=!$found{$1} }
elsif (/^begin$/) { $state=STATE_BEGIN }
else { die "unexpected in state $state: $_" }
}
elsif ($state==STATE_BEGIN) {
if (/^\s*(\d+)\s*$/) { $curnum=$1; $state=STATE_INBLOCK }
else { die "unexpected in state $state: $_" }
}
elsif ($state==STATE_INBLOCK) {
if (/^end$/) { $state=STATE_IDLE }
elsif (/bar/) { $found{$curnum}=1 }
}
else { die "bad state $state" }
print $_, "\n" if $pass==2 && $out;
}
die "unexpected state at eof: $state" unless $state==STATE_IDLE;
seek $fh, 0, 0 or die "seek $INFILE: $!";
}
close $fh;
Update: Note that these solutions don't remove ref N lines if they appear inside begin...end blocks; this was an assumption I made, but it's actually unclear what the desired behavior is in that case?
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.