Beefy Boxes and Bandwidth Generously Provided by pair Networks
Come for the quick hacks, stay for the epiphanies.
 
PerlMonks  

Another Look behind

by dominic01 (Sexton)
on Mar 12, 2015 at 12:00 UTC ( [id://1119771]=perlquestion: print w/replies, xml ) Need Help??

dominic01 has asked for the wisdom of the Perl Monks concerning the following question:

I am trying to get the handle at learning (*SKIP)(*FAIL) to get the variable "look-behind" concept but I am missing something.

$RLine = "R.N. Raox, J. Pure and Appl. Phys."; $Jrnl = "Appl. Phys."; if( $RLine =~ /^(.*?(?:(?: and| &)(*SKIP)(*FAIL))?) (\Q$Jrnl\E)$/ +){ print "11: $1\n"; print "22: $2\n"; print "Matched\n"; } else { print "Not Matched\n"; }

Basically when "and" or "&" comes just before the search pattern, it should not match. In this case if "and" comes anywhere in the line it matches.

$RLine = "R.N. Raox, J. Pure and Appl. Phys."; #Should not match -> Pattern OK $RLine = "R.N. Raox, J. Pure text Appl. Phys."; #Should match -> Pattern OK $RLine = "R.N. Raox, J. Pure and text Appl. Phys."; #Should match but it is not -> Pattern NOT OK $RLine = "R.N. and Raox, J. Pure text Appl. Phys."; #Should match but it is not -> Pattern NOT OK

Could someone help me out please.

Replies are listed 'Best First'.
Re: Another Look behind
by choroba (Cardinal) on Mar 12, 2015 at 12:47 UTC
Re: Another Look behind
by ww (Archbishop) on Mar 12, 2015 at 13:12 UTC

    UPDATE: See AnomalousMonk's reply below. My bad for misreading OP's problem. </UPDATE>

    A clear explanation of where the word "text" originates in Lines 007 and 010 would be helpful.

    But your chief problems are miscounting the captures and the failure of your regex to account for the space between "and" or &amp; and the journal name. Here's a less-overly-complicated version:

    #!/usr/bin/perl use 5.018; my $RLine = "R.N. Raox, J. Pure and Appl. Phys."; my $Jrnl = qr/Appl. Phys./; if( $RLine =~ /(.+)( and | &amp; )($Jrnl)$/ ) { #NB trailing space bef +ore the journal name print "11: $1\n"; print "33: $3\n"; print "Matched\n"; } else { print "Not Matched\n"; }
    Output:
    11: R.N. Raox, J. Pure 33: Appl. Phys. Matched

    Update: pasted wrong version of comment at Ln 008. Fixed.

      Some thoughts:

      • The regex you give matches against "R.N. Raox, J. Pure and Appl. Phys.", but I thought dominic01's OP wanted no match in this case because ' Appl. Phys.' is preceded by 'and';
      • The regex defined by  my $Jrnl = qr/Appl. Phys./; will also match something like 'Applx Physy';
      • You have the  use 5.018; statement at the beginning of your code, but nothing in the code seems to require version 5.18+.


      Give a man a fish:  <%-(-(-(-<

Re: Another Look behind
by AnomalousMonk (Archbishop) on Mar 12, 2015 at 18:50 UTC

    The essence of the variable-width negative look-behind hack technique is that some part of a string that is needed in a subsequent match is matched and then "consumed" by  (*SKIP) (which prevents backtracking) when the match is then forced to fail. (The  \K operator already provides very nice variable-width positive look-behind.)

    In the example below, the part that is "consumed" consists of some whitespace plus the entire  'Appl. Phys.' piece. If only a single whitespace character were guaranteed always to be present before  'Appl. Phys.' and this character was required for subsequent match, it would have been enough to consume only this character, but this seemed too fragile to me: more whitespace can easily creep in. Also, I have captured nothing because I don't understand just what you want from these captures: e.g., capturing the meta-quoted  'Appl. Phys.' is pointless because it's never going to change. What did you really want from these captures? (I'm also using non-capturing, atomic  (?>pattern) groups here rather than simple non-capturing  (?:pattern) groups because I think it makes reasoning about this trick a little easier.)

    c:\@Work\Perl>perl -wMstrict -le "use 5.010; ;; my $j = quotemeta 'Appl. Phys.'; ;; LINE: for my $l ( 'R.N. Raox, J. Pure and Appl. Phys.', 'R.N. Raox, J. Pure &amp; Appl. Phys.', 'R.N. Raox, J. Pure or Appl. Phys.', 'R.N. Raox, J. Pure and or Appl. Phys.', 'N.E. One, Fly Fishing and Appl. Phys.', 'N.E. One, Fly Fishing or Appl. Phys.', ) { next LINE unless $l =~ m{ (?> (?> and | &amp;) \s+ $j (*SKIP)(*F))? \s+ $j }xms; print qq{match: '$l'}; } " match: 'R.N. Raox, J. Pure or Appl. Phys.' match: 'R.N. Raox, J. Pure and or Appl. Phys.' match: 'N.E. One, Fly Fishing or Appl. Phys.'

    Update: See the usual suspects: perlre (esp. Special Backtracking Control Verbs), perlretut, perlrequick.


    Give a man a fish:  <%-(-(-(-<

      First your solution works perfectly for me. Next your are right that the some of my "capturing" doesnt make any sense and I was just testing few things by capturing it.

      Next I dont understand how to use the negative variable width look-back with (*SKIP)(*FAIL) when I have a huge regex. For e.g.

      $Line =~ /^(Some_RegEx) (Some_RegEx) (Some_RegEx)(?<!foo.*|some text) +$Jrnl (Some_RegEx)$/;
      Appreciate any pointers in this regard.

        I don't have a good idea of your difficulties, but here's a generalized approach. I can't provide a working example at the moment, so the following is untested handwaving.

        The  (*SKIP)(*FAIL) variable-width negative look-back hack works by messing up a match of something you need to have for an overall match. For a large regex, I tend to take the approach of factoring regex elements:

        # stuff we want to capture my $capture_this = qr{ ... }xms; my $capture_too = qr{ ... }xms; my $capture_also = qr{ ... }xms; my $capture_more = qr{ ... }xms; # stuff we want to cause match failure if before certain other stuff my $avoid_this = qr{ ... }xms; my $avoid_too = qr{ ... }xms; my $avoid_also = qr{ ... }xms; my $negatory = qr{ (?> [aeiou]+ | f[eio]e? | $avoid_too) }xms; # stuff we need for an overall match, may or may not be captured my $needed_for_match = qr{ ... }xms; my $needed_too = qr{ ... }xms; my $string = get_stringy_stuff(); my ($this, $too, $also, $yet_another) = $string =~ m{ \A ($capture_this) ($capture_too) ... (?> (?> $avoid_this | $avoid_too | $avoid_also | etc) $needed_for_ +match (*SKIP)(*F))? $needed_for_match # needed for overall match ($capture_also) ... (?> $negatory $needed_too (*SKIP)(*FAIL))? ($needed_too) # needed for overall match, also captured \z }xms; do_something_with($this, $too, $also, $yet_another);


        Give a man a fish:  <%-(-(-(-<

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1119771]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (4)
As of 2024-04-18 20:55 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found