http://qs321.pair.com?node_id=254775

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks, I am eternally humbled by how little I know of perl and regular expressions. How can I match variable length data between /FIRST/ .. /LAST/ (without including FIRST and LAST. Also what is the difference between ".." and "..."? I'm trying to do this:

if( @test = ($_ =~ /\d+\s/i ... /\sseconds/i) ) print "Test Data: @test\n";

__Input__ 192.168.1.1 seconds 192.168.1.1 links.html links, index.html index 10 seconds 192.168.1.1 article1.html art1, article2.html art2, adpage 200 second +s But my output is only: __DESIRED_OUTPUT__ Test Data: index.html index 5 Test Data: links.html links, index.html index 10 Test Data: article1.html art1, article2.html art2, adpage 200 __REAL_OUTPUT___ Test Data: 0 Test Data: 0 Test Data: 0

Replies are listed 'Best First'.
Re: Select data between a START and END pattern
by Thelonius (Priest) on May 01, 2003 at 20:51 UTC
    Although dragonchild gives a good example of .., here's a more general explanation, along with the difference between .. and ...

    First of all, there is use of .. in a list context:

    @a = 1 .. 7;
    Then there's the unrelated use in a scalar context. It only makes sense in a loop.
    while (something) { if (EXPRSTART .. EXPREND) { doit(); } }
    is equivalent to:
    $inmatch = 0; while (something) { if (!$inmatch && EXPRSTART) { $inmatch = 1; } if ($inmatch) { doit(); } if ($inmatch && EXPREND) { $inmatch = 0; } }
    Okay? Now for three dots:
    while (something) { if (EXPRSTART ... EXPREND) { doit(); } }
    is equivalent to:
    $inmatch = 0; while (something) { $wasinmatch = $inmatch; if (!$inmatch && EXPRSTART) { $inmatch = 1; } if ($inmatch) { doit(); } if ($wasinmatch && EXPREND) { $inmatch = 0; } }
    Subtle difference. To see it in action, compare:
    while (<>) { print if /A/ .. /B/ }
    with
    while (<>) { print if /A/ ... /B/ }
    on this input file
    Here's some text before Some text with an A some lines in the middle 1 some lines in the middle 2 some lines in the middle 3 Some text with a B some useless lines 1 some useless lines 2 some useless lines 3 A line with both an A and a B some lines after the line with both 1 some lines after the line with both 2 some lines after the line with both 3 Once again, text with a B more useless lines 1 more useless lines 2 more useless lines 3
      Excellent explanation! ++! Learn something new every day...

      ------
      We are the carpenters and bricklayers of the Information Age.

      Don't go borrowing trouble. For programmers, this means Worry only about what you need to implement.

      Please remember that I'm crufty and crochety. All opinions are purely mine and all code is untested, unless otherwise specified.

Re: Select data between a START and END pattern
by bart (Canon) on May 01, 2003 at 20:21 UTC
    How can I match variable length data between /FIRST/ .. /LAST/ (without including FIRST and LAST.
    Well at first sight I can imagine two methods.
    1. Check the value of this expression. It's not just a boolean, it has a special format: it returns the sequence number of how many times in a row it has returned true. Also, the last value is special, as it has "E0" appended. That doesn't change the numerical value, "5E0" is as much a valid format for the number 5, as "5" is. For example:
      for ('A' .. 'Z') { if(my $counter = ($_ eq 'F' .. $_ eq 'J')) { print "$counter: $_\n"; } }
      which prints:
      1: F
      2: G
      3: H
      4: I
      5E0: J
      
      As you can see, extracting the first (numerical 1) and last (string ends in "E0") time it'll match is easily recognized
    2. A second method is to have a flag for the first and second match:
      for ('A' .. 'Z') { if((my $first = $_ eq 'F') .. (my $last = $_ eq 'J')) { printf "%s first:%s last:%s\n", $_, $first?'yes':'no', $last?' +yes':'no', ; } }
      which prints:
      F first:yes last:no
      G first:no last:no
      H first:no last:no
      I first:no last:no
      J first:no last:yes
      
    Also what is the difference between ".." and "..."?
    Well... in some cases it is possible that the first and the last condition are both met on the first item. Sometimes you want to include that, in that case, use "..". Sometimes you want to skip it, for example if you want to match stuff between two identical delimiters. In the latter case, use "...", which skips the second test if you're on the first match — your "START".

    A demonstration of the difference: the next code takes a range between two numbers that are divisible by 3, in the sequence 1 .. 8

    • 2 dots:
      for (1 .. 8) { if((my $first = $_ % 3 == 0) .. (my $last = $_ % 3 == 0)) { printf "%s first:%s last:%s\n", $_, $first?'yes':'no', $last?' +yes':'no', ; } }
      Result:
      3 first:yes last:yes
      6 first:yes last:yes
      
      The start and the end condition are both met at the same time (divisible by 3), so the ranges are limited to one number.
    • 3 dots:
      for (1 .. 8) { if((my $first = $_ % 3 == 0) ... (my $last = $_ % 3 == 0)) { printf "%s first:%s last:%s\n", $_, $first?'yes':'no', $last?' +yes':'no', ; } }
      Result:
      3 first:yes last:no
      4 first:no last:no
      5 first:no last:no
      6 first:no last:yes
      
      This one finds a range from 3 to 6. The second test is not tried when the first test is succesful.
Re: Select data between a START and END pattern
by dragonchild (Archbishop) on May 01, 2003 at 19:44 UTC
    Try this:
    while (<DATA>) { /\d\s+(.*?)\s+seconds$/i && print "Test Data: $1\n"; } __DATA__ 192.168.1.1 seconds 192.168.1.1 links.html links, index.html index 10 seconds 192.168.1.1 article1.html art1, article2.html art2, adpage 200 second +s
    As for how to explain what you were doing wrong ... I've got a feeling you missed something in what tools you should apply. The flip-flop operator ("..") works something like:
    while (<DATA>) { next unless /START/ .. /END/; print $_; } __DATA__ asdf START 1 2 3 END asdf

    ------
    We are the carpenters and bricklayers of the Information Age.

    Don't go borrowing trouble. For programmers, this means Worry only about what you need to implement.

    Please remember that I'm crufty and crochety. All opinions are purely mine and all code is untested, unless otherwise specified.

Re: Select data between a START and END pattern
by Thelonius (Priest) on May 01, 2003 at 19:48 UTC
    If I understand what you want, then the .. and ... operators are not at all appropriate. I think you want:
    if (/\d+\s(.*)\sseconds/) { print "Test Data: $1\n"; }
Re: Select data between a START and END pattern
by tos (Deacon) on May 02, 2003 at 16:55 UTC
    Your inputdata in indat
    # cat indat 192.168.1.1 index.html index 5 seconds 192.168.1.1 links.html links, index.html index 10 seconds 192.168.1.1 article1.html art1, article2.html art2, adpage 200 second +s
    Try this
    # perl -wne '/[\d\.]+\s(.+)\sseconds/i && print "Test Data: $1\n"' ind +at Test Data: index.html index 5 Test Data: links.html links, index.html index 10 Test Data: article1.html art1, article2.html art2, adpage 200
    The decisive regex-part is (.+). The braces catch the desired String in the $1-variable. The regex-part [\d\.]+ matches on decimals or points as they appears in the ip-addresses. Consider man perlre for precise regex-infos.