how to check for a word in a file and if found remove that line, the above line and the below line from the file.

Ganesh Bharadwaj1 has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: how to check for a word in a file and if found remove that line, the above line and the below line from the file. by GrandFather (Saint) on Jan 20, 2016 at 05:43 UTC
You need to keep a buffer containing the lines you may need to throw away and print lines from the buffer when you know that they won't be thrown away. This stuff is tricky to get right because you have to correctly handle special cases for start up and ending conditions. Here's one way to do it: use strict; use warnings; my $kPreLines = 1; my $kPostLines = 1; my $skipping; my @bufferedLines; while (defined (my $line = <DATA>) \|\| @bufferedLines) { push @bufferedLines, $line if defined $line; while (@bufferedLines && $skipping) { shift @bufferedLines; --$skipping; } print shift @bufferedLines while @bufferedLines > $kPreLines + 1; next if !@bufferedLines; if ($bufferedLines[-1] =~ /XXXXX/) { $skipping = $kPostLines; @bufferedLines = (); next; } next if defined $line; print @bufferedLines; last; } __DATA__ QQQQQ PPPPP XXXXX is my name YYYYY KKKKK UUUUU BBBBB CCCCCC XXXXX is what I play KKKKK NNNNN [download] Prints: `QQQQQ KKKKK UUUUU BBBBB NNNNN` [download] Premature optimization is the root of all job security	[reply] [d/l] [select]
Re^2: how to check for a word in a file and if found remove that line, the above line and the below line from the file. by Discipulus (Canon) on Jan 20, 2016 at 09:37 UTC
I was sure it would be easier with splice but i falled in the 'modifying an array while iterating over it' pitfall.. To avoid a copied data I needed to undef some value and grep for defined values when printing the copy output.. anyway it works. I have modified the DATA a little to be easier to follow `use strict; use warnings; my @lin = <DATA>; map { $lin[$_] and $lin[$_] =~/XXXXX/ ? $_ == 0 ? splice @lin,0,2,undef,undef : $_ == $#lin ? splice @lin,$_-1,2,undef,undef : splice @lin,$_-1,3,undef,undef,undef : 1 } 0..$#lin; print "$_" for grep {defined} @lin; __DATA__ XXXXX Aoooo XXXXX is my name Boooo 11111 22222 33333 Coooo XXXXX is what I play Doooo 44444 # OUTPUT 11111 22222 33333 44444` [download] L* There are no rules, there are no thumbs.. Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.	[reply] [d/l]
Re^3: how to check for a word in a file and if found remove that line, the above line and the below line from the file. by mr_ron (Chaplain) on Jan 21, 2016 at 11:01 UTC
If you are going to read the whole file into memory you also allow for the following solution which seems more simple and direct than working with splice. `use strict; use warnings; my $file_content = do { local $/ = undef; <DATA>; }; $file_content =~ s/ (?:.\n)? ^X{5}.\n (?:.*\n)?//xmg; print $file_content; __DATA__ XXXXX Aoooo XXXXX is my name Boooo 11111 22222 33333 Coooo XXXXX is what I play Doooo 44444` [download] Ron	[reply] [d/l]
Re^2: how to check for a word in a file... -- who spot the trick? by Discipulus (Canon) on Jan 20, 2016 at 10:54 UTC
Who spot the trick? `use strict; use warnings; my @unv; my $pos; while (<DATA>){push @unv,$.-1,$.,$.+1 if /XXXXX/;$pos=(tell DATA)-leng +th $_ unless $pos} seek DATA,$pos,0; while (<DATA>) {print unless $.-__LINE__ ~~ @unv} __DATA__ XXXXX Aoooo XXXXX is my name Boooo 11111 22222 33333 Coooo XXXXX is what I play Doooo 44444 # OUTPUT 11111 22222 33333 44444` [download] If you spot the trick please explain it to... me! L* UPDATE: the trick can be avoided with a cleaner solotion assigning to `$.` `use strict; use warnings; my @unv; my $pos; while (<DATA>){push @unv,$.-1,$.,$.+1 if /XXXXX/;$pos=(tell DATA)-leng +th $_ unless $pos} seek DATA,$pos,0; $.=0; while (<DATA>) {print unless $. ~~ @unv}` [download] L* There are no rules, there are no thumbs.. Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.	[reply] [d/l] [select]
Re^2: how to check for a word in a file.. -- oneliner by Discipulus (Canon) on Jan 20, 2016 at 11:38 UTC
Less tricky than the above oneliner version: `# file used below has the same content of the previous example's DATA +token<P> perl -e "open $fh,$ARGV[0];while(<$fh>){push @unv,$.-1,$.,$.+1 if /XXX +XX/}seek $fh,0,0;while(<$fh>){print unless ++$index ~~ @unv}" file 11111 22222 33333 44444` [download] i was not able ~~to resolve~~ to use `perl -lne` because i've encountered some problem using `seek ARGV` even if in a `BEGIN` block. L* There are no rules, there are no thumbs.. Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.	[reply] [d/l] [select]
Re^2: how to check for a word in a file and if found remove that line, the above line and the below line from the file. by BillKSmith (Monsignor) on Jan 21, 2016 at 03:30 UTC
Another special case is when /xxxxx/ appears on two (or more) consecutive lines. Bill	[reply]
Re^2: how to check for a word in a file and if found remove that line, the above line and the below line from the file. by ikegami (Patriarch) on Jan 22, 2016 at 16:45 UTC
You can simplify by only keeping the lines you might want to print in the buffer. See Re: how to check for a word in a file and if found remove that line, the above line and the below line from the file.	[reply]
Re: how to check for a word in a file and if found remove that line, the above line and the below line from the file. by johngg (Canon) on Jan 20, 2016 at 11:59 UTC
Here's a slightly different approach that runs a `while` loop checking that end of file hasn't been reached rather than reading each line directly. Inside the loop a line is pushed onto the buffer then tested for the "reject" condition. If it is a reject then the next line is also added to the buffer then the whole buffer is discarded so as to reject all three lines. If the most recent line was not a reject then the previous line is shifted off the buffer and printed. use strict; use warnings; open my $inFH, q{<}, \ <<EOD or die $!; XXXXX - additional test 1 XXXXX - additional test 1 ZZZZZ HHHHH - additional test 1 output expected PPPPP XXXXX is my name 1 XXXXX is my name 2 YYYYY KKKKK UUUUU BBBBB CCCCCC XXXXX is what I play KKKKK NNNNN ZZZZZ - additional test 2b XXXXX - additional test 2a QQQQQ - additional test 3 RRRRR - additional test 4 GGGGG XXXXX - additional test 5 XXXXX - additional test 5 EOD my @buffer; while ( not eof $inFH ) { push @buffer, scalar <$inFH>; if ( $buffer[ -1 ] =~ m{XXXXX} ) { last if eof $inFH; push @buffer, scalar <$inFH>; push @buffer, scalar <$inFH> while $buffer[ -1 ] =~ m{XXXXX} and not eof $inFH; @buffer = (); } else { print shift @buffer unless scalar @buffer == 1 and not eof $inFH; } } close $inFH or die $!; [download] The output. `HHHHH - additional test 1 output expected KKKKK UUUUU BBBBB NNNNN RRRRR - additional test 4` [download] I hope this is of interest. Update: Added the output. Update: 2: Inserted `push @buffer, scalar <$inFH> while ...` line and added some paired "XXXXX" lines to address the point raised by mr_mischief. The output is unchanged. Cheers, JohnGG	[reply] [d/l] [select]
Re: how to check for a word in a file and if found remove that line, the above line and the below line from the file. by hdb (Monsignor) on Jan 20, 2016 at 12:56 UTC
Here is my proposal which has a somewhat simpler logic utilizing the empty string to print nothing: `use strict; use warnings; my $previous=""; while(<DATA>){ if(/XXXXX/){ <DATA>; $previous=""; }else{ print $previous; $previous=$_; } } print $previous; __DATA__ PPPPP XXXXX is my name YYYYY KKKKK UUUUU BBBBB CCCCCC XXXXX is what I play KKKKK NNNNN` [download]	[reply] [d/l]
Re^2: how to check for a word in a file and if found remove that line, the above line and the below line from the file. by ikegami (Patriarch) on Jan 22, 2016 at 16:27 UTC
That fails if there are two lines in a row that matches. That can fail if the last line matches. You can't rely on a file handle return EOF more than once.	[reply]
Re^3: how to check for a word in a file and if found remove that line, the above line and the below line from the file. by hdb (Monsignor) on Jan 22, 2016 at 19:16 UTC
Damn! Life is just too complicated. Thanks for pointing this out.	[reply]
Re: how to check for a word in a file and if found remove that line, the above line and the below line from the file. by kcott (Archbishop) on Jan 20, 2016 at 07:31 UTC
G'day Ganesh Bharadwaj1, Below is my take on a solution. I did some additional testing as follows (all successful): I originally used your data and got your expected output. I then added "test 1" and got your expected output again. I then added "test 2a" and "test 2b": your expected output again. I then added "test 3": your expected output again. I then added "test 4": your expected output PLUS the `RRRRR` line at the end. #!/usr/bin/env perl use strict; use warnings; my ($skip, @buf); while (<DATA>) { if ($skip) { $skip = 0; next; } if (/XXXXX/) { @buf = (); $skip = 1; } else { push @buf, $_; $skip = 0; } if ($. > 1 && @buf >= 2) { print for @buf[0 .. $#buf - 1]; @buf = ($buf[-1]); } } print for @buf; __DATA__ XXXXX - additional test 1 PPPPP XXXXX is my name YYYYY KKKKK UUUUU BBBBB CCCCCC XXXXX is what I play KKKKK NNNNN ZZZZZ - additional test 2b XXXXX - additional test 2a QQQQQ - additional test 3 RRRRR - additional test 4 [download] — Ken	[reply] [d/l] [select]
Re: how to check for a word in a file and if found remove that line, the above line and the below line from the file. by mr_mischief (Monsignor) on Jan 20, 2016 at 18:35 UTC
You need to figure out if you're worried about overlapping matches, and specify your problem accordingly. There are solutions which will check excluded lines to see if there are additional lines to exclude after those. There are also solutions that will exclude a set of lines and move on as if the excluded lines after (but not before of course) the matched line can't possibly match. Here's an example of the difference.: `print me exclude me match me match me exclude me print me` [download] vs: `print me exclude me match me would match me but the above line excluded me print me anyway because the previous line wasn't matched after exclusi +on print me` [download] Here's a filter that will do either, based on a command-line argument. It also is configurable for how many additional lines to exclude (before, after, or both).: #!/usr/bin/perl use strict; use warnings; use Getopt::Long (); process( init() ); exit; sub process { my $opts = shift; my @buffer = (); my $wait = 0; while ( <> ) { if ( ( scalar @buffer ) > $opts->{ 'b' } ) { print ( shift @buffer ); } if ( $wait ) { $wait--; next unless $opts->{ 'nest' }; } else { push @buffer, $_; } if ( /$opts->{ 'pattern' }/ ) { @buffer = (); $wait = $opts->{ 'a' }; } } print @buffer; } sub init { my %options = ( 'help' => 0, 'a' => 1, 'b' => 1, 'c' => 0, 'nest' => 0, 'pattern' => '', ); Getopt::Long::Configure( 'gnu_getopt' ); Getopt::Long::GetOptions( \%options, 'help+', 'a=i', 'b=i', 'c=i', + 'nest+', 'pattern=s' ); if ( $options{ 'c' } ) { $options{ 'a' } = $options{ 'b' } = $options{ 'c' }; } if ( $options{ 'help' } ) { warn "Usage: $0 [[--help]\|[[-a <n>] [-b <n>]\|[-c <n>]] [--nest +] --pattern <s>\n\n" . "Print the input file excluding the matched line provided by + the -p argument and as many lines before and after that line as spec +ified.\n\n" . "\t--help\t\t\tthis help message\n" . "\t-a <number>\t\texclude <number> lines after the matched l +ine\n" . "\t-b <number>\t\texclude <number> lines before the matched +line\n" . "\t-c <number>\t\texclude <number> lines before and after th +e matched line, overriding -a and -b in the process\n" . "\t--nest\t\t\tmatch lines already excluded by a preceding m +atch, and exclude the following lines accordingly\n" . "\t--pattern <string>\texclude the line matching this patter +n (may be a regular expression) and other lines as specified by the o +ther options\n\n"; exit 0; } return \%options; } [download] I made it a filter as the file handling itself is kind of secondary to the problem at hand. It's possible to add input and output arguments either positionally or via flags. My preference if I were to flesh it out a bit more would be to default to STDIN and STDOUT but allow flags to override one, the other, or both.	[reply] [d/l] [select]
Re: how to check for a word in a file and if found remove that line, the above line and the below line from the file. (step by step) by Anonymous Monk on Jan 20, 2016 at 03:57 UTC
this is how you figure it out so you read first line, it doesn't match so you read second line, it doesn't match so you read third line, it matches so you read fourth line, it doesn't match so you read fifth line, it doesn't match so when should you print lines you want to print? how many lines do you have to remember?	[reply]
Re: how to check for a word in a file and if found remove that line, the above line and the below line from the file. by ikegami (Patriarch) on Jan 22, 2016 at 16:35 UTC
`my $prev = ''; my $skip_next = 0; while (<>) { if (/XXXXX/) { $prev = ''; $skip_next = 1; } elsif ($skip_next) { --$skip_next; } else { print($prev); $prev = $_; } } print($prev);` [download] If you want to skip more than one line: `my $skip_before = 1; my $skip_after = 1; my @prev; my $skip_next = 0; while (<>) { if (/XXXXX/) { @prev = (); $skip_next = $skip_after; } elsif ($skip_next) { --$skip_next; } else { push @prev, $_; print(shift(@prev)) if @prev > $skip_before; } } print(@prev);` [download] One read. One check. Handles subsequent matching lines. Handles matches on the first line. Handles matches on the last line.	[reply] [d/l] [select]
Re: how to check for a word in a file and if found remove that line, the above line and the below line from the file. by oiskuu (Hermit) on Jan 22, 2016 at 17:53 UTC
That is a variation of the grep exercise. Check this thread for further examples. (The main difference between your task and the other one is negation of printing sense. Btw, the `grep -v` option only negates matching sense and so `grep -v -A1 -B1` can not be used for your purposes.)	[reply] [d/l] [select]
Re: how to check for a word in a file and if found remove previous and next -- 3 oneliner variations by Discipulus (Canon) on Jan 26, 2016 at 11:50 UTC
Yes, i got fixed with this and i made unsuccesfull tries using `eof..eof` flip-flop to process even files.. anyway now i get two semi-obfuscated oneliners, the first, simpler, is called with a single arg (the name of file): `# $y is <C>yet</C> set for the second read, @u is <C>unwanted</C> <P> perl -lnE "BEGIN{push @ARGV,$ARGV[0]}eof&&!$y?$y=close ARGV:/XXXXX/&&! +$y?push @u,$.-1..$.+1:$y&&!($.~~@u)?say:0" next-and-previous.txt 11111 22222 33333 44444<P>` [download] while the latter is parametrizable one: firts arg is the regex used to match unwanted strings, the second is the number of lines to be removed after and before and then the filename: `perl -lnE "BEGIN{$r=shift;$r=qr/$r/;$n=shift@ARGV;push @ARGV,$ARGV[0]} +eof&&!$y?$y=close ARGV:/$r/&&!$y?push @u,$.-$n..$.+$n:$y&&!($.~~@u)?s +ay:0" XXXXX 1 next-and-previous.txt 11111 22222 33333 44444 perl -lnE "BEGIN{$r=shift;$r=qr/$r/;$n=shift;push @ARGV,$ARGV[0]}eof&& +!$y?$y=close ARGV:/$r/&&!$y?push @u,$.-$n..$.+$n:$y&&!($.~~@u)?say:0" + XXXXX 2 next-and-previous.txt 22222` [download] I promise i'll try to no bother you anymore with this.. :=) update the parametrized oneliner can be shortend using `-s` switch (that has some problem anyhow) It is the second time i use `--` too. `perl -slnE "BEGIN{$r=qr/$r/;push @ARGV,$ARGV[0]}eof&&!$y?$y=close ARGV +:/$r&&!$y?push@u,$.-$n..$.+$n:$y&&!($.~~@u)?say:0" -- -r=XXXXX -n=1 next-and-previous.txt 11111 22222 33333 44444` [download] L* There are no rules, there are no thumbs.. Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.	[reply] [d/l] [select]


No such thing as a small change
	PerlMonks