Select data between a START and END pattern

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks, I am eternally humbled by how little I know of perl and regular expressions. How can I match variable length data between /FIRST/ .. /LAST/ (without including FIRST and LAST. Also what is the difference between ".." and "..."? I'm trying to do this:

if( @test = ($_ =~ /\d+\s/i ... /\sseconds/i) ) print "Test Data: @test\n";

__Input__
192.168.1.1 seconds
192.168.1.1 links.html links, index.html index 10 seconds 
192.168.1.1 article1.html  art1, article2.html art2, adpage 200 second
+s    
    
But my output is only: 
__DESIRED_OUTPUT__
Test Data: index.html index 5 
Test Data: links.html links, index.html index 10 
Test Data:  article1.html  art1, article2.html art2, adpage 200

__REAL_OUTPUT___
Test Data: 0
Test Data: 0
Test Data: 0
[download]

Comment on Select data between a START and END pattern Download Code

Replies are listed 'Best First'.
Re: Select data between a START and END pattern by Thelonius (Priest) on May 01, 2003 at 20:51 UTC
Although dragonchild gives a good example of .., here's a more general explanation, along with the difference between .. and ... First of all, there is use of .. in a list context: `@a = 1 .. 7;` [download] Then there's the unrelated use in a scalar context. It only makes sense in a loop. `while (something) { if (EXPRSTART .. EXPREND) { doit(); } }` [download] is equivalent to: `$inmatch = 0; while (something) { if (!$inmatch && EXPRSTART) { $inmatch = 1; } if ($inmatch) { doit(); } if ($inmatch && EXPREND) { $inmatch = 0; } }` [download] Okay? Now for three dots: `while (something) { if (EXPRSTART ... EXPREND) { doit(); } }` [download] is equivalent to: `$inmatch = 0; while (something) { $wasinmatch = $inmatch; if (!$inmatch && EXPRSTART) { $inmatch = 1; } if ($inmatch) { doit(); } if ($wasinmatch && EXPREND) { $inmatch = 0; } }` [download] Subtle difference. To see it in action, compare: `while (<>) { print if /A/ .. /B/ }` [download] with `while (<>) { print if /A/ ... /B/ }` [download] on this input file `Here's some text before Some text with an A some lines in the middle 1 some lines in the middle 2 some lines in the middle 3 Some text with a B some useless lines 1 some useless lines 2 some useless lines 3 A line with both an A and a B some lines after the line with both 1 some lines after the line with both 2 some lines after the line with both 3 Once again, text with a B more useless lines 1 more useless lines 2 more useless lines 3` [download]	[reply] [d/l] [select]
Re2: Select data between a START and END pattern by dragonchild (Archbishop) on May 01, 2003 at 21:12 UTC
Excellent explanation! ++! Learn something new every day... ------ We are the carpenters and bricklayers of the Information Age. Don't go borrowing trouble. For programmers, this means Worry only about what you need to implement. Please remember that I'm crufty and crochety. All opinions are purely mine and all code is untested, unless otherwise specified.	[reply]
Re: Select data between a START and END pattern by bart (Canon) on May 01, 2003 at 20:21 UTC
How can I match variable length data between `/FIRST/ .. /LAST/` (without including FIRST and LAST. Well at first sight I can imagine two methods. Check the value of this expression. It's not just a boolean, it has a special format: it returns the sequence number of how many times in a row it has returned true. Also, the last value is special, as it has "E0" appended. That doesn't change the numerical value, "5E0" is as much a valid format for the number 5, as "5" is. For example: `for ('A' .. 'Z') { if(my $counter = ($_ eq 'F' .. $_ eq 'J')) { print "$counter: $_\n"; } }` [download] which prints: 1: F 2: G 3: H 4: I 5E0: J As you can see, extracting the first (numerical 1) and last (string ends in "E0") time it'll match is easily recognized A second method is to have a flag for the first and second match: `for ('A' .. 'Z') { if((my $first = $_ eq 'F') .. (my $last = $_ eq 'J')) { printf "%s first:%s last:%s\n", $_, $first?'yes':'no', $last?' +yes':'no', ; } }` [download] which prints: F first:yes last:no G first:no last:no H first:no last:no I first:no last:no J first:no last:yes Also what is the difference between ".." and "..."? Well... in some cases it is possible that the first and the last condition are both met on the first item. Sometimes you want to include that, in that case, use "..". Sometimes you want to skip it, for example if you want to match stuff between two identical delimiters. In the latter case, use "...", which skips the second test if you're on the first match — your "START". A demonstration of the difference: the next code takes a range between two numbers that are divisible by 3, in the sequence 1 .. 8 2 dots: `for (1 .. 8) { if((my $first = $_ % 3 == 0) .. (my $last = $_ % 3 == 0)) { printf "%s first:%s last:%s\n", $_, $first?'yes':'no', $last?' +yes':'no', ; } }` [download] Result: 3 first:yes last:yes 6 first:yes last:yes The start and the end condition are both met at the same time (divisible by 3), so the ranges are limited to one number. 3 dots: `for (1 .. 8) { if((my $first = $_ % 3 == 0) ... (my $last = $_ % 3 == 0)) { printf "%s first:%s last:%s\n", $_, $first?'yes':'no', $last?' +yes':'no', ; } }` [download] Result: 3 first:yes last:no 4 first:no last:no 5 first:no last:no 6 first:no last:yes This one finds a range from 3 to 6. The second test is not tried when the first test is succesful.	[reply] [d/l] [select]
Re: Select data between a START and END pattern by dragonchild (Archbishop) on May 01, 2003 at 19:44 UTC
Try this: `while (<DATA>) { /\d\s+(.?)\s+seconds$/i && print "Test Data: $1\n"; } __DATA__ 192.168.1.1 seconds 192.168.1.1 links.html links, index.html index 10 seconds 192.168.1.1 article1.html art1, article2.html art2, adpage 200 second +s` [download] As for how to explain what you were doing wrong ... I've got a feeling you missed something in what tools you should apply. The flip-flop operator ("..") works something like: `while (<DATA>) { next unless /START/ .. /END/; print $_; } __DATA__ asdf START 1 2 3 END asdf` [download] ------ We are the carpenters and bricklayers of the Information Age.* Don't go borrowing trouble. For programmers, this means Worry only about what you need to implement. Please remember that I'm crufty and crochety. All opinions are purely mine and all code is untested, unless otherwise specified.	[reply] [d/l] [select]
Re: Select data between a START and END pattern by Thelonius (Priest) on May 01, 2003 at 19:48 UTC
If I understand what you want, then the .. and ... operators are not at all appropriate. I think you want: `if (/\d+\s(.*)\sseconds/) { print "Test Data: $1\n"; }` [download]	[reply] [d/l]
Re: Select data between a START and END pattern by tos (Deacon) on May 02, 2003 at 16:55 UTC
Your inputdata in indat `# cat indat 192.168.1.1 index.html index 5 seconds 192.168.1.1 links.html links, index.html index 10 seconds 192.168.1.1 article1.html art1, article2.html art2, adpage 200 second +s` [download] Try this `# perl -wne '/[\d\.]+\s(.+)\sseconds/i && print "Test Data: $1\n"' ind +at Test Data: index.html index 5 Test Data: links.html links, index.html index 10 Test Data: article1.html art1, article2.html art2, adpage 200` [download] The decisive regex-part is `(.+)`. The braces catch the desired String in the $1-variable. The regex-part `[\d\.]+` matches on decimals or points as they appears in the ip-addresses. Consider man perlre for precise regex-infos.	[reply] [d/l] [select]

Back to Seekers of Perl Wisdom