Re: command line perl command to get between lines with non greedy match
by GrandFather (Saint) on Jan 17, 2020 at 21:54 UTC
|
A strategy that reads a line at a time and saves lines after PATTERN1 is found until either PATTERN3 is found (and the saved lines are printed), or some other pattern is found and the saved lines are discarded. That may be a bit much to cleanly do as a one liner so bite the bullet and write a script to do the work. The script can be called using a single command line so you haven't lost any convenience of use by writing the script.
Optimising for fewest key strokes only makes sense transmitting to Pluto or beyond
| [reply] |
Re: command line perl command to get between lines with non greedy match
by haukex (Archbishop) on Jan 18, 2020 at 17:54 UTC
|
Note this is extremely large file and can't put the whole file into a string.
How large is the section you want to read, does that fit into memory? One possible approach is to buffer the lines in an array, as several monks have shown. Just for fun, I thought about what a script like this might do if you didn't want to read the whole file nor the section being searched for into memory; in that case you could scan the file and remember the byte offsets of the strings you're looking for. In the following, I'm reading that data in, but that's not required, you could do something else with those byte offsets. Note that the code below only works with bytes, not Unicode characters.
use warnings;
use strict;
my $file = 'in.txt';
my ($start,$end);
open my $fh, '<:raw', $file or die "$file: $!";
my $offset = 0;
while (<$fh>) {
$start = $offset if /PATTERN1/;
$offset = tell $fh or die "tell: $!";
$end = $offset if /PATTERN3/;
}
die "Failed to find second pattern after first pattern"
unless defined $start && defined $end && $end > $start;
seek $fh, $start, 0 or die "seek: $!";
my $bytes = $end-$start;
read($fh, my $data, $bytes)==$bytes
or die "failed to read $bytes bytes";
close $fh;
print $data;
| [reply] [d/l] |
Re: command line perl command to get between lines with non greedy match
by GrandFather (Saint) on Jan 17, 2020 at 22:01 UTC
|
Except for very minor edits it is usual to note in the node that you have updated it. In this case you have completely reformatted the node (which is good), but left several replies looking silly.
Optimising for fewest key strokes only makes sense transmitting to Pluto or beyond
| [reply] |
Re: command line perl command to get between lines with non greedy match
by tybalt89 (Monsignor) on Jan 18, 2020 at 20:47 UTC
|
Here's one that uses almost no storage, by seeking back to the last PATTERN1 and re-reading the input file.
Note that this will not work on a pipe.
#!/usr/bin/perl
use strict; # https://perlmonks.org/?node_id=11111545
use warnings;
my $fh = *DATA; # FIXME to your input file, DATA only used for testing
my $lastpattern1;
while( <$fh> )
{
if( /PATTERN1/ )
{
$lastpattern1 = tell($fh) - length $_;
}
elsif( $lastpattern1 and /PATTERN3/ )
{
seek $fh, $lastpattern1, 0;
while( <$fh> )
{
my $end = s/ (?=PATTERN3)/\n\n/;
print;
$end and last;
}
$lastpattern1 = undef;
}
}
__DATA__
PATTERN1 SOME INFO
TEXT1
TEXT2
TEXT3 PATTERN2 SOME INFO
PATTERN1 SOME INFO
TEXT4
TEXT5
TEXT6 PATTERN3 SOME INFO
PATTERN1 SOME INFO
TEXT1
TEXT2
TEXT3 PATTERN4 SOME INFO
PATTERN1 SOME INFO
TEXT4
TEXT55
TEXT6 PATTERN3 SOME INFO
I also do the fix up on the PATTERN3 line, though I'm curious if that was just a typo on your part?
| [reply] [d/l] |
Re: command line perl command to get between lines with non greedy match (updated)
by AnomalousMonk (Archbishop) on Jan 17, 2020 at 21:51 UTC
|
use strict;
use warnings;
my $rx_start = qr{ \A \s* PATTERN1 }xms;
my $rx_stop = qr{ \A \s* PATTERN3 }xms;
my @records;
RECORD:
while (my $record = <STDIN>) {
if ($record =~ $rx_start) {
# push @records, $record; # UPDATE: NO: still "greedy" extracti
+on of records/lines
@records = $record; # UPDATE: FIXED: only extracts BETWEE
+N start/stop patterns
next RECORD;
}
if ($record =~ $rx_stop) {
print @records, $record;
@records = ();
next RECORD;
}
push @records, $record if @records;
}
exit;
Update: Example code fixed to extract records only between the start/stop patterns. Any reformatting of output that may be needed is still not addressed.
Give a man a fish: <%-{-{-{-<
| [reply] [d/l] [select] |
Re: command line perl command to get between lines with non greedy match
by LanX (Saint) on Jan 17, 2020 at 22:36 UTC
|
Thanks for editing! :)
> is a greedy match
That's not the accurate term, it's just a multiple match and you only want the last one.°
One way to achieve this in a one-liner is not to print all the matches but to store them in an array and to only print the last match in an END{} block.
The difficulty here is to always reset the array for previous matches.
> Note this is extremely large file and can't put the whole file into a string.
In this case it might be better to go reverse and read a sliding window from the end.
But I don't know how to do this with a one-liner.
To decide this one needs to know how "large" is "extreme" ?
°) I think I misunderstood your problem, see Re: command line perl command to get between lines with non greedy match for another approach. | [reply] [d/l] |
|
DB<72> map {if ($x=(/b/../d/)) { $out[$x]=$_; $last=$x }} a..e,a..e,
+a..b,1..3,d..e;
DB<73> x @out[1..$last]
0 'b'
1 1
2 2
3 3
4 'd'
DB<74>
$x is actually a count of the flip-flop match and will be reset 3 times.
we keep the last max $x in $last
you only need to END{print @out[1..$last] } in your one-liner to eject just these last lines.
update
assuming PATTERN2 and PATTERN3 are similar
>perl -ne"if ($x=(/PATTERN1/.../PATTERN?/)) { $out[$x]=$_; $last=$x; }
+; END{ print @out[1..$last] }" input
PATTERN1 SOME INFO
TEXT4
TEXT5
TEXT6 PATTERN3 SOME INFO
C:\tmp\files>
| [reply] [d/l] [select] |
Re: command line perl command to get between lines with non greedy match
by tybalt89 (Monsignor) on Jan 18, 2020 at 20:29 UTC
|
Assuming from your example you are on a linux system, you should have a "tac", a filter program the reverses
a file line-by-line. So:
tac somefilename | perl -ne 'print if /PATTERN3/../PATTERN1/' | tac
works on my ArchLinux system, and all you then need to do is fix up the extra TEST on the PATTERN3 line.
| [reply] [d/l] |
Re: command line perl command to get between lines with non greedy match
by AnomalousMonk (Archbishop) on Jan 17, 2020 at 22:10 UTC
|
TEXT6 PATTERN3 SOME INFO
Output:
TEXT6
PATTERN3 SOME INFO
Is it true that your input has TEXT6 and the terminating PATTERN3 string on the same line, and that the output should be reformatted so that they are on separate lines separated by a blank line? (BTW: Thanks for editing your post, but you left no citation of any change (update: please see How do I change/delete my post?).)
Give a man a fish: <%-{-{-{-<
| [reply] [d/l] [select] |
Re: command line perl command to get between lines with non greedy match (no flip-flop)
by LanX (Saint) on Jan 18, 2020 at 15:48 UTC
|
Seems like avoiding the range operator is the trick.
It's kind of an iterator version of print grep /PATTERN3/, split /PATTERN1/, slurp("input")
C:\tmp\files>perl -nE" @o=() if /PATTERN1/; push @o,$_; say qq(<@o>) i
+f /PATTERN3/ " input2
<PATTERN1 SOME INFO
TEXT4
TEXT5
TEXT6 PATTERN3 SOME INFO
>
<PATTERN1 SOME INFO
TEXT4
TEXT5
TEXT6 PATTERN3 SOME INFO
>
C:\tmp\files>
NB: this will also fire if PATTERN3 appears before the first PATTERN1!
If that's a problem, use a flag.
| [reply] [d/l] [select] |
Re: command line perl command to get between lines with non greedy match
by LanX (Saint) on Jan 17, 2020 at 23:25 UTC
|
I probably misunderstood your problem, this works in printing the last shortest match between 1 and 3
C:\tmp\files>perl -ne"@out=() if /PATTERN1/; push @out,$_ if /PATTERN1
+/../PATTERN3/; END{ print @out }" input
PATTERN1 SOME INFO
TEXT4
TEXT5
TEXT6 PATTERN3 SOME INFO
C:\tmp\files>
NB: for linux you'll need to replace " to '
| [reply] [d/l] |
|
C:\tmp\files>perl -nE"if ($x=(/PATTERN1/../PATTERN3/)) { @out=() if /P
+ATTERN1/; push @out,$_; print @out if $x=~/E0$/ }" input2
PATTERN1 SOME INFO
TEXT4
TEXT5
TEXT6 PATTERN3 SOME INFO
PATTERN1 SOME INFO
TEXT4
TEXT5
TEXT6 PATTERN3 SOME INFO
C:\tmp\files>
update
a bit cleaner
C:\tmp\files>perl -nE" $first=/PATTERN1/; $last=/PATTERN3/; if ( $firs
+t..$last) { @o=() if $first; push @o,$_; say @o if $last }" input2
PATTERN1 SOME INFO
TEXT4
TEXT5
TEXT6 PATTERN3 SOME INFO
PATTERN1 SOME INFO
TEXT4
TEXT5
TEXT6 PATTERN3 SOME INFO
| [reply] [d/l] [select] |
Re:command line perl command to get between lines with non greedy match
by LanX (Saint) on Jan 17, 2020 at 21:49 UTC
|
Unfortunately that's almost unreadable and even the original is only one long line.
Please click the edit button and reformat your post to make it readable.
Then please use <code>...</code> and <p> tags.
Update
OP added tags in the meantime. :) | [reply] [d/l] [select] |