Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

Re^2: some efficiency, please

by Anonymous Monk
on Apr 12, 2019 at 15:37 UTC ( [id://1232496]=note: print w/replies, xml ) Need Help??


in reply to Re: some efficiency, please
in thread some efficiency, please

Sorry, I am trying to simplify things, and likely making them more complicated. :(

I think I can provide some sample data easier than I can change the code to make it work on the sample.

This is what the input would look like:

ref 1 ref 2 ref 3 ref 4 begin 1 end begin 2 bar end begin 3 end begin 4 bar end ref 5 foo ref 6 ref 7 begin 5 end begin 6 bar begin 7 bar end

So I am trying to remove only the "ref n" lines (not the n lines themselves), and only for paragraphs where "bar" appears in the paragraph. The output should look like this:

ref 1 ref 3 begin 1 end begin 2 bar end begin 3 end begin 4 bar end ref 5 begin 5 end begin 6 bar begin 7 bar end

So I do (think I) need to pass through the file twice - once to find the references I want to remove, and once to actually remove them.

Replies are listed 'Best First'.
Re^3: some efficiency, please (updated)
by haukex (Archbishop) on Apr 12, 2019 at 16:19 UTC
    So I do (think I) need to pass through the file twice - once to find the references I want to remove, and once to actually remove them.

    I took that as a challenge ;-) This only needs a single pass by reversing both the input and output by piping it through tac, and produces your desired output:

    use warnings; use strict; die "Usage: $0 INFILE\n" unless @ARGV==1; my $INFILE = shift @ARGV; open my $ofh, '|-', 'tac' or die "tac (out): $!"; open my $ifh, '-|', 'tac', $INFILE or die "tac $INFILE: $!"; my ($aminblock,$prevnum,$foundstr); my %found; while (<$ifh>) { chomp; my $out=1; if (!$aminblock) { if (/^end$/) { undef $foundstr; $aminblock=1 } elsif (/^\s*(?:foo\s+)?ref\s+(\d+)\s*$/) { die "ref $1 without block?" unless exists $found{$1}; $out = !$found{$1}; } else { die "unexpected outside of a block: $_" } } else { if (/^\s*(\d+)\s*$/) { $prevnum=$1 } elsif (/^begin$/) { die "block ended without number?" unless defined $prevnum; $found{$prevnum} = $foundstr; undef $prevnum; $aminblock=0; } else { undef $prevnum; if (/bar/) { $foundstr=1 } } } print {$ofh} $_, "\n" if $out; } close $ifh or die "tac $INFILE: ".($!||"\$?=$?"); close $ofh or die "tac (out): ".($!||"\$?=$?");

    Although the two passes through tac might actually make that less efficient for large files. Here's a two-pass version:

    use warnings; use strict; die "Usage: $0 INFILE\n" unless @ARGV==1; my $INFILE = shift @ARGV; use constant { STATE_IDLE=>0, STATE_BEGIN=>1, STATE_INBLOCK=>2 }; open my $fh, '<', $INFILE or die "$INFILE: $!"; my %found; my $state = STATE_IDLE; my $curnum; for my $pass (1..2) { while (<$fh>) { chomp; my $out = 1; if ($state==STATE_IDLE) { if (/^\s*(?:foo\s+)?ref\s+(\d+)\s*$/) { $out=!$found{$1} } elsif (/^begin$/) { $state=STATE_BEGIN } else { die "unexpected in state $state: $_" } } elsif ($state==STATE_BEGIN) { if (/^\s*(\d+)\s*$/) { $curnum=$1; $state=STATE_INBLOCK } else { die "unexpected in state $state: $_" } } elsif ($state==STATE_INBLOCK) { if (/^end$/) { $state=STATE_IDLE } elsif (/bar/) { $found{$curnum}=1 } } else { die "bad state $state" } print $_, "\n" if $pass==2 && $out; } die "unexpected state at eof: $state" unless $state==STATE_IDLE; seek $fh, 0, 0 or die "seek $INFILE: $!"; } close $fh;

    Update: Note that these solutions don't remove ref N lines if they appear inside begin...end blocks; this was an assumption I made, but it's actually unclear what the desired behavior is in that case?

Re^3: some efficiency, please
by Anonymous Monk on Apr 12, 2019 at 15:53 UTC
    Oops, left out an end line in the example data.

    begin 6 bar

    SHOULD BE:

    begin 6 bar end

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1232496]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others admiring the Monastery: (4)
As of 2024-03-29 00:31 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found