get n lines before or after a pattern

darklord_999 has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: get n lines before or after a pattern by davido (Cardinal) on Jul 25, 2012 at 16:33 UTC
When you hear yourself saying "I need to know what comes n lines before XYZ", you should be thinking "I need to stash n previous lines while I iterate through the file." When you hear yourself saying, "I need to know what comes after XYZ until PDQ is found.", you should be thinking of how to identify state (ie, how to keep track of having found the trigger). You can keep track of state with a variable, or you can do it by flowing into a different branch of code. This snippet accomplishes your goal by stashing two lines at all times (clearing them only after XYZ is found), and by flowing into a different branch when XYZ has been found, until PDQ shows up. As I mentioned above, this is one of several common ways of dealing with state. use strict; use warnings; my $find = 'jack'; my $trigger_re = qr{^name\s+$find\b}; my $finally_re = qr(^lastname\s+\p{Alpha}+\b); my @stash; while( my $line = <DATA> ) { chomp $line; if( $line =~ $trigger_re ) { print "$_\n" for @stash; @stash = (); print $line, "\n"; while ( my $next = <DATA> ) { if( $next =~ $finally_re ) { print $next; last; } } } else { push @stash, $line; while( @stash > 2 ) { shift @stash; } } } __DATA__ start id 10 address Richmond name jack xxxxx aaaaa lastname black yyyy zzzzz id 11 address Central name rick cccccc dddddd lastname hanna eeeee yyyyy id 12 address denver name jack sssss tttttt lastname strong rrrrr mmmmm id 13 address Virginia name mick aaaaaaa ooooooo lastname jagger gggggg hhhhhh id 14 address Maine name rick sssss sssss lastname stewart ssssss ffffff end [download] The output is... `id 10 address Richmond name jack lastname black id 12 address denver name jack lastname strong` [download] If the stash hasn't received two lines ahead of "name jack", it will quietly just print however many it accumulated (max 2). If the "lastname" never shows up, it will quietly flow through the end of the file. This may not be what you want; it's possible that you'll want to just carp about a malformed record the moment the next "name" shows up. That's pretty easy to implement, so I'll leave it to you if you find it advantageous. Similarly, it's a simple check to verify that two lines are stored in @stash prior to printing, and it would be easy to carp a warning about a malformed record there as well. I build the regexes outside of the loop just to keep the loop code as simple (and general) as possible. This has the added efficiency benefit of assuring that the regex that contains variable interpolation will only be compiled once rather than each time through the loop. Dave	[reply] [d/l] [select]
Re: get n lines before or after a pattern by Kenosis (Priest) on Jul 25, 2012 at 17:09 UTC
Here's another option: use Modern::Perl; my $searchFor = 'jack'; local $/ = 'id '; while (<DATA>) { next if !/\nname\s+\b$searchFor\b/; say 'id ', join "\n", ( split "\n" )[ 0, 1, 2, 5 ]; } __DATA__ start id 10 address Richmond name jack xxxxx aaaaa lastname black yyyy zzzzz id 11 address Central name rick cccccc dddddd lastname hanna eeeee yyyyy id 12 address denver name jack sssss tttttt lastname strong rrrrr mmmmm id 13 address Virginia name mick aaaaaaa ooooooo lastname jagger gggggg hhhhhh id 14 address Maine name rick sssss sssss lastname stewart ssssss ffffff end [download] Output: `id 10 address Richmond name jack lastname black id 12 address denver name jack lastname strong` [download] Hope this helps!	[reply] [d/l] [select]
Re^2: get n lines before or after a pattern by johngg (Canon) on Jul 25, 2012 at 23:54 UTC
Reading "records" rather than lines is a nice approach. One minor point, your local is not really local as you have not confined it to a particular scope so it applies from the point it appears until the end of the script. Rather than the split and array slice, another approach could be to open a file handle against a reference to the record so that you can read it line by line in an inner scope and just print the lines you want. This has the advantage that the record layout can change and it will still work. Read more... (1110 Bytes) I hope this is of interest. Cheers, JohnGG	[reply] [d/l]
Re^3: get n lines before or after a pattern by Kenosis (Priest) on Jul 26, 2012 at 03:41 UTC
This is of interest, and excellent, too, JohnGG! I was aware that I didn't confine the `local $/;` to a block, not thinking too much about the code snippet. However, I'll remember--as a best practice--to do so with future local (dynamically scoped) variables. It was good to point this out. I like your refined/seasoned coding: scoping, reading in a multi-line record, opening a file handle on the record-containing scalar, and then `grep`ping through the lines to display the OP's desired output. Indeed, this is of interest, very well thought out, and very much appreciated. Thank you.	[reply] [d/l] [select]
Re: get n lines before a pattern by VinsWorldcom (Prior) on Jul 25, 2012 at 14:43 UTC
Note you're output is not only showing 2 lines before the pattern, but also 1 line AFTER the pattern. You don't need Perl for something that simple: `grep -B2 -A1 jack test.txt` [download] UPDATE: Since the OP updated the original question, this approach is no longer valid. See my reply (Re^3: get n lines before a pattern) below.	[reply] [d/l]
Re^2: get n lines before a pattern by darklord_999 (Acolyte) on Jul 25, 2012 at 14:55 UTC
I have updated the details of my file. Please see the change . Sorry for the previous error.	[reply]
Re^3: get n lines before a pattern by VinsWorldcom (Prior) on Jul 25, 2012 at 15:16 UTC
Yes, the changes to the file in the OP certainly require an updated approach. What have you tried? I would loop through the file saving each key and either pushing to a data structure if the name matches or resetting and continuing. Pseudo code for the loop and structure I'd use: `my @matches; my $FOUND = 0; my %info = {}; while (<INFILE>) { chomp $_; if (($_ =~ /^id/) and ($FOUND)) { push @matches \%info; $FOUND = 0; %info = {} } if ($_ =~ /^id/) { (undef, $info{id}) = split / /, $_} if ($_ =~ /^address/) { (undef, $info{address}) = split / /, $_} if ($_ =~ /^name/) { (undef, $info{fname}) = split / /, $_} ... if ($searchPattern eq $info{fname}) { $FOUND = 1; } }` [download] UPDATE: Added 'chomp' and updated 'split' commands as per kennethk suggestions to me.	[reply] [d/l]
Re: get n lines before or after a pattern by kennethk (Abbot) on Jul 25, 2012 at 15:16 UTC
What have you tried? What didn't work? See How do I post a question effectively?. There are two ways I can think of doing this. Probably the simpler from your perspective would be to iterate over lines in a while loop, and set up some state variables to stash values. Then, when you hit a `lastname` line, you can test the value of `$name` (or `$hash{name}`) to see if it is `jack`, outputting all relevant information if it is. The more complex approach would be using regular expressions with the m and g modifiers. This is how I'd do, but tends to be a little more fragile, less obvious for code review and more challenging for the neophyte. #11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.	[reply] [d/l] [select]
Re: get n lines before or after a pattern by zentara (Archbishop) on Jul 25, 2012 at 16:28 UTC
Untested, but a useful approach. `#!/usr/bin/perl use strict; use warnings; my @buffer; # a queue data structure while ( <DATA> ) { if ( /I sent/ ) { print @buffer; # 3 lines before print; # the matching line print scalar(<DATA>); # 1 line following last; # all done } push @buffer, $_; shift @buffer if @buffer > 3; } __DATA__ this is the output from the command I sent to the command interperter` [download] I'm not really a human, but I play one on earth. Old Perl Programmer Haiku ................... flash japh	[reply] [d/l]
Re: get n lines before or after a pattern by xiaoyafeng (Deacon) on Jul 25, 2012 at 17:48 UTC
try natatime in List::MoreUtils, maybe it makes your code more elegant? ;) use List::MoreUtils qw/natatime/; my @contents = <DATA>; pop @contents; shift @contents; my $it = natatime 8, @contents; while (my @vals = $it->()) { print "@vals[0,1,2] \n" if $vals[2] =~ /jack/; } __DATA__ start id 10 address Richmond name jack xxxxx aaaaa lastname black yyyy zzzzz id 11 address Central name rick cccccc dddddd lastname hanna eeeee yyyyy id 12 address denver name jack sssss tttttt lastname strong rrrrr mmmmm id 13 address Virginia name mick aaaaaaa ooooooo lastname jagger gggggg hhhhhh id 14 address Maine name rick sssss sssss lastname stewart ssssss ffffff end [download] The another advantage of this approach compared to other way is you won't lose the rest part of every chunk. you can print any elements of @vals by changing slice. I am trying to improve my English skills, if you see a mistake please feel free to reply or /msg me a correction	[reply] [d/l]
Re^2: get n lines before or after a pattern by Kenosis (Priest) on Jul 25, 2012 at 19:06 UTC
Nice use of `List::MoreUtils qw/natatime/`! However, consider using `/\bjack\b/`, as your current regex also matches "jackson", "jackie", "jacklyn", etc.	[reply] [d/l] [select]
Re^2: get n lines before or after a pattern by ww (Archbishop) on Jul 25, 2012 at 20:59 UTC
Nice (and + +), but the regex can go astray: `C:>perl -E "my $word="jackhammer"; if ($word =~ /\bjack\b/) { say $word; } else { say \"No word-boundry-delimited 'jack's' found in $word\"; }" No word-boundry-delimited 'jack's' found in jackhammer` [download]	[reply] [d/l]
Re^3: get n lines before or after a pattern by Kenosis (Priest) on Jul 25, 2012 at 21:32 UTC
Perhaps I'm missing something, but I wouldn't want to find "jackhammer" if I were searching for "jack" as the first name--as listed in the OP's data set. However, the non-word-boundary regex is perfect for finding all first names containing the sub-string "jack", as `$vals[2] =~ /jack/` would.	[reply] [d/l]
Re^4: get n lines before or after a pattern by ww (Archbishop) on Jul 25, 2012 at 22:14 UTC
Re^5: get n lines before or after a pattern by Kenosis (Priest) on Jul 25, 2012 at 22:31 UTC
Re: get n lines before or after a pattern by Anonymous Monk on Jul 25, 2012 at 15:46 UTC
Search for grep, the Unix command, implementation in Perl. There are at least one such implementations posted around here (don't have the (search) links handy); another was posted long ago in comp.lang.perl.misc newsgroup. Yet another is App::Ack; refer to `&print_line_with_context` & `&get_context` subs.	[reply] [d/l] [select]
Re: get n lines before or after a pattern by Athanasius (Archbishop) on Jul 26, 2012 at 03:32 UTC
Here is another approach, using Tie::File: `#! perl use strict; use warnings; use Tie::File; my $file = 'test.txt'; tie my @lines, 'Tie::File', $file or die "Cannot tie file '$file': $!" +; for my $i (0 .. $#lines) { if ($lines[$i] =~ m{ \b jack \b }x) { for ($i - 2 .. $i) { print $lines[$_], "\n" unless $_ < 0; } for (my $found = 0; !$found && $i <= $#lines; ++$i) { if ($lines[$i] =~ m{ \b lastname \b }x) { print $lines[$i], "\n"; $found = 1; } } } } untie @lines;` [download] What is nice about this approach is that, by treating the data file as an ordinary array, it is possible to meet more complicated requirements without the programming overhead of manually maintaining line buffers. So, this approach has the advantage of being scalable. Some notes on Tie::File: It’s a core module: Tie::File Written by Dominus From the docs: “The file is not loaded into memory, so this will work even for gigantic files.” HTH, Athanasius <°(((>< contra mundum	[reply] [d/l]
Re: get n lines before or after a pattern by cheekuperl (Monk) on Jul 26, 2012 at 06:33 UTC
Check if Re: printing several lines around match helps	[reply]
Re: get n lines before or after a pattern by brx (Pilgrim) on Jul 26, 2012 at 17:09 UTC
Similar to zentara's approach in Re: get n lines before or after a pattern. The idea is to keep it short, to be independent of other lines content, to deal with file boundaries (ie to find 'jack' in firsts or lasts lines is OK). note: the program could print the same line several times if 'jack' is found in consecutive lines - does OP want that? #!perl use strict; use warnings; my @buffer=("")x6; my $line; while (@buffer) { push @buffer,$line if defined($line=scalar(<DATA>)); shift @buffer; print @buffer[0,1,2],$buffer[5]//'' if ($buffer[2]//'')=~/\bjack\b +/; #match index: ^ ^ } __DATA__ extra jack extra extra start id 10 address Richmond name jack xxxxx aaaaa lastname black yyyy zzzzz id 11 address Central name rick cccccc dddddd lastname hanna eeeee yyyyy id 12 address denver name jack sssss tttttt lastname strong rrrrr mmmmm id 13 address Virginia name mick aaaaaaa ooooooo lastname jagger gggggg hhhhhh id 14 address Maine name rick sssss sssss lastname stewart ssssss ffffff end extra extra jack [download] English is not my mother tongue. Les tongues de ma m�re sont "made in France".	[reply] [d/l]