Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

how to check for a word in a file and if found remove that line, the above line and the below line from the file.

by Ganesh Bharadwaj1 (Sexton)
on Jan 20, 2016 at 03:42 UTC ( #1153137=perlquestion: print w/replies, xml ) Need Help??

Ganesh Bharadwaj1 has asked for the wisdom of the Perl Monks concerning the following question:

Hi monks, I have written a perl script for reading an input file and checking if there are certain keywords present in that file and remove those lines from the file.

#!/usr/bin/perl use strict; use warnings; open(my $in, '<', 'out2.txt') or die "Cannot open input.txt: $!"; open(my $out, '>', 'out2_mod.sp') or die "Cannot open output.txt: $!"; while (<$in>) { print $out $_ unless /XXXXX/; } close($in); close($out);
How do I modify the perl script to remove the one line above and below the line I am removing currently,as well as remove the line containing XXXXX.

To rephrase. my input file will have

PPPPP XXXXX is my name YYYYY KKKKK UUUUU BBBBB CCCCCC XXXXX is what I play KKKKK NNNNN
I want to find the lines were XXXXX exists and remove the line above and below that line and also line containing XXXXX. So my ouptut should look like
KKKKK UUUUU BBBBB NNNNN
  • Comment on how to check for a word in a file and if found remove that line, the above line and the below line from the file.
  • Select or Download Code

Replies are listed 'Best First'.
Re: how to check for a word in a file and if found remove that line, the above line and the below line from the file.
by GrandFather (Sage) on Jan 20, 2016 at 05:43 UTC

    You need to keep a buffer containing the lines you may need to throw away and print lines from the buffer when you know that they won't be thrown away. This stuff is tricky to get right because you have to correctly handle special cases for start up and ending conditions. Here's one way to do it:

    use strict; use warnings; my $kPreLines = 1; my $kPostLines = 1; my $skipping; my @bufferedLines; while (defined (my $line = <DATA>) || @bufferedLines) { push @bufferedLines, $line if defined $line; while (@bufferedLines && $skipping) { shift @bufferedLines; --$skipping; } print shift @bufferedLines while @bufferedLines > $kPreLines + 1; next if !@bufferedLines; if ($bufferedLines[-1] =~ /XXXXX/) { $skipping = $kPostLines; @bufferedLines = (); next; } next if defined $line; print @bufferedLines; last; } __DATA__ QQQQQ PPPPP XXXXX is my name YYYYY KKKKK UUUUU BBBBB CCCCCC XXXXX is what I play KKKKK NNNNN

    Prints:

    QQQQQ KKKKK UUUUU BBBBB NNNNN
    Premature optimization is the root of all job security
      I was sure it would be easier with splice but i falled in the 'modifying an array while iterating over it' pitfall..
      To avoid a copied data I needed to undef some value and grep for defined values when printing the copy output.. anyway it works.
      I have modified the DATA a little to be easier to follow
      use strict; use warnings; my @lin = <DATA>; map { $lin[$_] and $lin[$_] =~/XXXXX/ ? $_ == 0 ? splice @lin,0,2,undef,undef : $_ == $#lin ? splice @lin,$_-1,2,undef,undef : splice @lin,$_-1,3,undef,undef,undef : 1 } 0..$#lin; print "$_" for grep {defined} @lin; __DATA__ XXXXX Aoooo XXXXX is my name Boooo 11111 22222 33333 Coooo XXXXX is what I play Doooo 44444 # OUTPUT 11111 22222 33333 44444


      L*
      There are no rules, there are no thumbs..
      Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.

        If you are going to read the whole file into memory you also allow for the following solution which seems more simple and direct than working with splice.

        use strict; use warnings; my $file_content = do { local $/ = undef; <DATA>; }; $file_content =~ s/ (?:.*\n)? ^X{5}.*\n (?:.*\n)?//xmg; print $file_content; __DATA__ XXXXX Aoooo XXXXX is my name Boooo 11111 22222 33333 Coooo XXXXX is what I play Doooo 44444
        Ron
      Who spot the trick?
      use strict; use warnings; my @unv; my $pos; while (<DATA>){push @unv,$.-1,$.,$.+1 if /XXXXX/;$pos=(tell DATA)-leng +th $_ unless $pos} seek DATA,$pos,0; while (<DATA>) {print unless $.-__LINE__ ~~ @unv} __DATA__ XXXXX Aoooo XXXXX is my name Boooo 11111 22222 33333 Coooo XXXXX is what I play Doooo 44444 # OUTPUT 11111 22222 33333 44444
      If you spot the trick please explain it to... me!
      L*

      UPDATE: the trick can be avoided with a cleaner solotion assigning to $.
      use strict; use warnings; my @unv; my $pos; while (<DATA>){push @unv,$.-1,$.,$.+1 if /XXXXX/;$pos=(tell DATA)-leng +th $_ unless $pos} seek DATA,$pos,0; $.=0; while (<DATA>) {print unless $. ~~ @unv}

      L*

      There are no rules, there are no thumbs..
      Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.
      Less tricky than the above oneliner version:
      # file used below has the same content of the previous example's DATA +token<P> perl -e "open $fh,$ARGV[0];while(<$fh>){push @unv,$.-1,$.,$.+1 if /XXX +XX/}seek $fh,0,0;while(<$fh>){print unless ++$index ~~ @unv}" file 11111 22222 33333 44444

      i was not able to resolve to use perl -lne because i've encountered some problem using seek ARGV even if in a BEGIN block.

      L*
      There are no rules, there are no thumbs..
      Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.
      Another special case is when /xxxxx/ appears on two (or more) consecutive lines.
      Bill
Re: how to check for a word in a file and if found remove that line, the above line and the below line from the file.
by johngg (Canon) on Jan 20, 2016 at 11:59 UTC

    Here's a slightly different approach that runs a while loop checking that end of file hasn't been reached rather than reading each line directly. Inside the loop a line is pushed onto the buffer then tested for the "reject" condition. If it is a reject then the next line is also added to the buffer then the whole buffer is discarded so as to reject all three lines. If the most recent line was not a reject then the previous line is shifted off the buffer and printed.

    use strict; use warnings; open my $inFH, q{<}, \ <<EOD or die $!; XXXXX - additional test 1 XXXXX - additional test 1 ZZZZZ HHHHH - additional test 1 output expected PPPPP XXXXX is my name 1 XXXXX is my name 2 YYYYY KKKKK UUUUU BBBBB CCCCCC XXXXX is what I play KKKKK NNNNN ZZZZZ - additional test 2b XXXXX - additional test 2a QQQQQ - additional test 3 RRRRR - additional test 4 GGGGG XXXXX - additional test 5 XXXXX - additional test 5 EOD my @buffer; while ( not eof $inFH ) { push @buffer, scalar <$inFH>; if ( $buffer[ -1 ] =~ m{XXXXX} ) { last if eof $inFH; push @buffer, scalar <$inFH>; push @buffer, scalar <$inFH> while $buffer[ -1 ] =~ m{XXXXX} and not eof $inFH; @buffer = (); } else { print shift @buffer unless scalar @buffer == 1 and not eof $inFH; } } close $inFH or die $!;

    The output.

    HHHHH - additional test 1 output expected KKKKK UUUUU BBBBB NNNNN RRRRR - additional test 4

    I hope this is of interest.

    Update: Added the output.

    Update: 2: Inserted push @buffer, scalar <$inFH> while ... line and added some paired "XXXXX" lines to address the point raised by mr_mischief. The output is unchanged.

    Cheers,

    JohnGG

Re: how to check for a word in a file and if found remove that line, the above line and the below line from the file.
by hdb (Monsignor) on Jan 20, 2016 at 12:56 UTC

    Here is my proposal which has a somewhat simpler logic utilizing the empty string to print nothing:

    use strict; use warnings; my $previous=""; while(<DATA>){ if(/XXXXX/){ <DATA>; $previous=""; }else{ print $previous; $previous=$_; } } print $previous; __DATA__ PPPPP XXXXX is my name YYYYY KKKKK UUUUU BBBBB CCCCCC XXXXX is what I play KKKKK NNNNN

      That fails if there are two lines in a row that matches.

      That can fail if the last line matches. You can't rely on a file handle return EOF more than once.

        Damn! Life is just too complicated. Thanks for pointing this out.

Re: how to check for a word in a file and if found remove that line, the above line and the below line from the file.
by kcott (Bishop) on Jan 20, 2016 at 07:31 UTC

    G'day Ganesh Bharadwaj1,

    Below is my take on a solution. I did some additional testing as follows (all successful):

    • I originally used your data and got your expected output.
    • I then added "test 1" and got your expected output again.
    • I then added "test 2a" and "test 2b": your expected output again.
    • I then added "test 3": your expected output again.
    • I then added "test 4": your expected output PLUS the RRRRR line at the end.
    #!/usr/bin/env perl use strict; use warnings; my ($skip, @buf); while (<DATA>) { if ($skip) { $skip = 0; next; } if (/XXXXX/) { @buf = (); $skip = 1; } else { push @buf, $_; $skip = 0; } if ($. > 1 && @buf >= 2) { print for @buf[0 .. $#buf - 1]; @buf = ($buf[-1]); } } print for @buf; __DATA__ XXXXX - additional test 1 PPPPP XXXXX is my name YYYYY KKKKK UUUUU BBBBB CCCCCC XXXXX is what I play KKKKK NNNNN ZZZZZ - additional test 2b XXXXX - additional test 2a QQQQQ - additional test 3 RRRRR - additional test 4

    — Ken

Re: how to check for a word in a file and if found remove that line, the above line and the below line from the file.
by mr_mischief (Monsignor) on Jan 20, 2016 at 18:35 UTC

    You need to figure out if you're worried about overlapping matches, and specify your problem accordingly. There are solutions which will check excluded lines to see if there are additional lines to exclude after those. There are also solutions that will exclude a set of lines and move on as if the excluded lines after (but not before of course) the matched line can't possibly match. Here's an example of the difference.:

    print me exclude me match me match me exclude me print me

    vs:

    print me exclude me match me would match me but the above line excluded me print me anyway because the previous line wasn't matched after exclusi +on print me

    Here's a filter that will do either, based on a command-line argument. It also is configurable for how many additional lines to exclude (before, after, or both).:

    #!/usr/bin/perl use strict; use warnings; use Getopt::Long (); process( init() ); exit; sub process { my $opts = shift; my @buffer = (); my $wait = 0; while ( <> ) { if ( ( scalar @buffer ) > $opts->{ 'b' } ) { print ( shift @buffer ); } if ( $wait ) { $wait--; next unless $opts->{ 'nest' }; } else { push @buffer, $_; } if ( /$opts->{ 'pattern' }/ ) { @buffer = (); $wait = $opts->{ 'a' }; } } print @buffer; } sub init { my %options = ( 'help' => 0, 'a' => 1, 'b' => 1, 'c' => 0, 'nest' => 0, 'pattern' => '', ); Getopt::Long::Configure( 'gnu_getopt' ); Getopt::Long::GetOptions( \%options, 'help+', 'a=i', 'b=i', 'c=i', + 'nest+', 'pattern=s' ); if ( $options{ 'c' } ) { $options{ 'a' } = $options{ 'b' } = $options{ 'c' }; } if ( $options{ 'help' } ) { warn "Usage: $0 [[--help]|[[-a <n>] [-b <n>]|[-c <n>]] [--nest +] --pattern <s>\n\n" . "Print the input file excluding the matched line provided by + the -p argument and as many lines before and after that line as spec +ified.\n\n" . "\t--help\t\t\tthis help message\n" . "\t-a <number>\t\texclude <number> lines after the matched l +ine\n" . "\t-b <number>\t\texclude <number> lines before the matched +line\n" . "\t-c <number>\t\texclude <number> lines before and after th +e matched line, overriding -a and -b in the process\n" . "\t--nest\t\t\tmatch lines already excluded by a preceding m +atch, and exclude the following lines accordingly\n" . "\t--pattern <string>\texclude the line matching this patter +n (may be a regular expression) and other lines as specified by the o +ther options\n\n"; exit 0; } return \%options; }

    I made it a filter as the file handling itself is kind of secondary to the problem at hand. It's possible to add input and output arguments either positionally or via flags. My preference if I were to flesh it out a bit more would be to default to STDIN and STDOUT but allow flags to override one, the other, or both.

Re: how to check for a word in a file and if found remove that line, the above line and the below line from the file. (step by step)
by Anonymous Monk on Jan 20, 2016 at 03:57 UTC

    this is how you figure it out

    so you read first line, it doesn't match

    so you read second line, it doesn't match

    so you read third line, it matches

    so you read fourth line, it doesn't match

    so you read fifth line, it doesn't match

    so when should you print lines you want to print?

    how many lines do you have to remember?

Re: how to check for a word in a file and if found remove that line, the above line and the below line from the file.
by ikegami (Pope) on Jan 22, 2016 at 16:35 UTC
    my $prev = ''; my $skip_next = 0; while (<>) { if (/XXXXX/) { $prev = ''; $skip_next = 1; } elsif ($skip_next) { --$skip_next; } else { print($prev); $prev = $_; } } print($prev);

    If you want to skip more than one line:

    my $skip_before = 1; my $skip_after = 1; my @prev; my $skip_next = 0; while (<>) { if (/XXXXX/) { @prev = (); $skip_next = $skip_after; } elsif ($skip_next) { --$skip_next; } else { push @prev, $_; print(shift(@prev)) if @prev > $skip_before; } } print(@prev);

    One read. One check. Handles subsequent matching lines. Handles matches on the first line. Handles matches on the last line.

Re: how to check for a word in a file and if found remove that line, the above line and the below line from the file.
by oiskuu (Hermit) on Jan 22, 2016 at 17:53 UTC

    That is a variation of the grep exercise. Check this thread for further examples. (The main difference between your task and the other one is negation of printing sense. Btw, the grep -v option only negates matching sense and so grep -v -A1 -B1 can not be used for your purposes.)

Re: how to check for a word in a file and if found remove previous and next -- 3 oneliner variations
by Discipulus (Abbot) on Jan 26, 2016 at 11:50 UTC
    Yes, i got fixed with this and i made unsuccesfull tries using eof..eof flip-flop to process even files..
    anyway now i get two semi-obfuscated oneliners, the first, simpler, is called with a single arg (the name of file):
    # $y is <C>yet</C> set for the second read, @u is <C>unwanted</C> <P> perl -lnE "BEGIN{push @ARGV,$ARGV[0]}eof&&!$y?$y=close ARGV:/XXXXX/&&! +$y?push @u,$.-1..$.+1:$y&&!($.~~@u)?say:0" next-and-previous.txt 11111 22222 33333 44444<P>
    while the latter is parametrizable one: firts arg is the regex used to match unwanted strings, the second is the number of lines to be removed after and before and then the filename:
    perl -lnE "BEGIN{$r=shift;$r=qr/$r/;$n=shift@ARGV;push @ARGV,$ARGV[0]} +eof&&!$y?$y=close ARGV:/$r/&&!$y?push @u,$.-$n..$.+$n:$y&&!($.~~@u)?s +ay:0" XXXXX 1 next-and-previous.txt 11111 22222 33333 44444 perl -lnE "BEGIN{$r=shift;$r=qr/$r/;$n=shift;push @ARGV,$ARGV[0]}eof&& +!$y?$y=close ARGV:/$r/&&!$y?push @u,$.-$n..$.+$n:$y&&!($.~~@u)?say:0" + XXXXX 2 next-and-previous.txt 22222

    I promise i'll try to no bother you anymore with this.. :=)

    update the parametrized oneliner can be shortend using -s switch (that has some problem anyhow) It is the second time i use -- too.

    perl -slnE "BEGIN{$r=qr/$r/;push @ARGV,$ARGV[0]}eof&&!$y?$y=close ARGV +:/$r&&!$y?push@u,$.-$n..$.+$n:$y&&!($.~~@u)?say:0" -- -r=XXXXX -n=1 next-and-previous.txt 11111 22222 33333 44444

    L*
    There are no rules, there are no thumbs..
    Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1153137]
Approved by GrandFather
Front-paged by kcott
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others cooling their heels in the Monastery: (2)
As of 2020-10-21 22:01 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    My favourite web site is:












    Results (223 votes). Check out past polls.

    Notices?