Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

Am I on the pipe, or what?

by BorgCopyeditor (Friar)
on Jul 09, 2002 at 22:58 UTC ( [id://180648]=perlquestion: print w/replies, xml ) Need Help??

BorgCopyeditor has asked for the wisdom of the Perl Monks concerning the following question:

I guess I just don't understand the syntax of the pipe, because the following script (fragment) is giving a strange result: i.e., not what I expected.
use strict;
my (@dictionary,$query,$term);
@dictionary=('lo/gos','lo/gou','lo/gw|','lo/gon');
$query='lo/gou';
foreach $term (@dictionary) {if($query=~$term) {print "$term\n";}}
This yields:
lo/gou
lo/gw|
which is not at all what I expected. If I invert $query and $term, only 'lo/gou' matches. Can someone help me understand what's going on here? BCE --"Your punctuation skills are insufficient!"

Replies are listed 'Best First'.
Re: Am I on the pipe, or what?
by Zaxo (Archbishop) on Jul 09, 2002 at 23:15 UTC

    Perl is seeing the pipe as alternation to nothing, i.e. 'either this or anything else'. Here are the details from a debugging perl (5.8.0-RC1):

    $ perl -Dr pipe.pl Omitting $` $& $' support. EXECUTING... Compiling REx `lo/gos' size 4 Got 36 bytes for offset annotations. first at 1 rarest char / at 2 1: EXACT <lo/gos>(4) 4: END(0) anchored `lo/gos' at 0 (checking anchored isall) minlen 6 Offsets: [4] 1[6] 0[0] 0[0] 7[0] Guessing start of match, REx `lo/gos' against `lo/gou'... Did not find anchored substr `lo/gos'... Match rejected by optimizer Freeing REx: `lo/gos' Compiling REx `lo/gou' size 4 Got 36 bytes for offset annotations. first at 1 rarest char / at 2 1: EXACT <lo/gou>(4) 4: END(0) anchored `lo/gou' at 0 (checking anchored isall) minlen 6 Offsets: [4] 1[6] 0[0] 0[0] 7[0] Guessing start of match, REx `lo/gou' against `lo/gou'... Found anchored substr `lo/gou' at offset 0... Guessed: match at offset 0 lo/gou Freeing REx: `lo/gou' Compiling REx `lo/gw|' size 7 Got 60 bytes for offset annotations. 1: BRANCH(5) 2: EXACT <lo/gw>(7) 5: BRANCH(7) 6: NOTHING(7) 7: END(0) minlen 0 Offsets: [7] 0[0] 1[5] 0[0] 0[0] 6[1] 6[0] 7[0] Matching REx `lo/gw|' against `lo/gou' Setting an EVAL scope, savestack=16 0 <> <lo/gou> | 1: BRANCH Setting an EVAL scope, savestack=22 0 <> <lo/gou> | 2: EXACT <lo/gw> failed... 0 <> <lo/gou> | 6: NOTHING 0 <> <lo/gou> | 7: END Match successful! lo/gw| Freeing REx: `lo/gw|' Compiling REx `lo/gon' size 4 Got 36 bytes for offset annotations. first at 1 rarest char / at 2 1: EXACT <lo/gon>(4) 4: END(0) anchored `lo/gon' at 0 (checking anchored isall) minlen 6 Offsets: [4] 1[6] 0[0] 0[0] 7[0] Guessing start of match, REx `lo/gon' against `lo/gou'... Did not find anchored substr `lo/gon'... Match rejected by optimizer Freeing REx: `lo/gon'

    After Compline,
    Zaxo

      If you don't want to compile a debugging version use YAPE::Regex::Explain
      use strict; use YAPE::Regex::Explain; print YAPE::Regex::Explain->new(qr'lo/gw|')->explain(); __OUTPUT__ The regular expression: (?-imsx:lo/gw|) matches as follows: NODE EXPLANATION ---------------------------------------------------------------------- (?-imsx: group, but do not capture (case-sensitive) (with ^ and $ matching normally) (with . not matching \n) (matching whitespace and # normally): ---------------------------------------------------------------------- lo/gw 'lo/gw' ---------------------------------------------------------------------- | OR ---------------------------------------------------------------------- ) end of grouping ----------------------------------------------------------------------

      --

      flounder

Regex without 'm' or '/'
by dvergin (Monsignor) on Jul 10, 2002 at 01:30 UTC
    The example that BorgCopyeditor supplies with his question brings up an interesting point. Why does this work:      if($query=~$term) {...} Or, to offer a very plain example, why does this work:
    if ( 'abc' =~ 'bc' ) { print "yes\n"; }
    The rule I learned was (quoting from perlop): "If "/" is the delimiter then the initial m is optional."

    Fair enough. The implication is:

    if ( 'abc' =~ m/bc/ ) { # good if ( 'abc' =~ /bc/ ) { # good if ( 'abc' =~ m%bc% ) { # good (for any non-alphanum) but: if ( 'abc' =~ 'bc' ) { # BAD! (or so we assume)
    But it is not so. As I said above, both of the examples at the top of this response work. Why?

    What I cannot find in the on-line docs (update: see danger's response below) but I know from experimentation and the words of Camel 3, page 144, is that, even without the 'm' or the '/', the righthand side of =~ "still counts as a m// matching operation, but there'll be no place to put any trailing modifiers, and you'll have to handle your own quoting."

    So...

    if ( 'abc' =~ 'bc' ) { # works if ( 'abc' =~ $pattern ) { # works if ( 'abc' =~ "$pattern" ) { # works if ( 'abc' =~ bc ) { # works!! if ( 'abc' =~ 'bc'g ) { # Error: bareword 'g'
    The present writer is not responsible for any sideways looks any of these may earn you from your peers.

    ------------------------------------------------------------
    "Perl is a mess and that's good because the
    problem space is also a mess.
    " - Larry Wall

      What I cannot find in the on-line docs

      Its an operator thing, not a regex thing --- from perlop under "Binding Operators":

      ... If the right argument is an expression rather than a search pattern, substitution, or transliteration, it is interpreted as a search pattern at run time. This can be less efficient than an explicit search, because the pat- tern must be compiled every time the expression is evalu- ated.

      This can sometimes cause problems for newcomers, especially when they use split with a double-quoted string as the split pattern (as seems to happen with undue frequency) and have an escaped metacharacter in the pattern:

      $_ = 'this has a | pipe'; @a = split /\|/; # good print join(":", @a),"\n"; @a = split "\|"; # oops print join(":", @a),"\n";

      In the second case, the double-quoted string is first evaluated and the *resulting* string (sans backslash) is then used as the pattern in the regex.

        If the right argument is an expression rather than a search pattern, substitution, or transliteration, it is interpreted as a search pattern at run time. This can be less efficient than an explicit search, because the pat- tern must be compiled every time the expression is evalu- ated.

        This perlop text must be a holdover from a while back. At least as far back as 5.005_03, code like $str1 =~ 'bc' (with a constant string for the pattern) would be compiled only once. Between 5.6.0 and 5.6.1 an extra check was added, so that even $str =~ $str2 would not be recompiled as long as $str2 had not changed.

        I guess the second statement should simply be deleted from that paragraph.

        Hugo

      Thanks to Texas Tess, Zaxo, and dvergin for patient explanations of both the obvious and the arcane. Also, next time I have a regex problem, maybe I'll brave the debugger. That was very enlightening.

      FWIW, the data I'm parsing is in 'betacode', an ASCII transcription scheme for Ancient Greek. It's convenient in some ways, but chock full of what I still have to remind myself are metacharacters. Grrr.

      BCE
      --Your punctuation skills are insufficient!

Re: Am I on the pipe, or what?
by TexasTess (Beadle) on Jul 09, 2002 at 23:14 UTC
    You have to escape the metacharacter to ensure it's not evaluated literally...but that really does not explain why it picks up the w as well....try escaping the pipe and see if it still returns the same..

    TexasTess
    "Great Spirits Often Encounter Violent Opposition From Mediocre Minds" --Albert Einstein

    UPDATE: After re-reading this..I think I must have been on the pipe myself when i wrote it!

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://180648]
Approved by Zaxo
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chanting in the Monastery: (8)
As of 2024-04-23 10:37 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found