BorgCopyeditor has asked for the wisdom of the Perl Monks concerning the following question:
I guess I just don't understand the syntax of the pipe, because the following script (fragment) is giving a strange result: i.e., not what I expected.
use strict;
my (@dictionary,$query,$term);
@dictionary=('lo/gos','lo/gou','lo/gw|','lo/gon');
$query='lo/gou';
foreach $term (@dictionary) {if($query=~$term) {print "$term\n";}}
This yields:
lo/gou
lo/gw|
which is not at all what I expected. If I invert $query and $term, only 'lo/gou' matches. Can someone help me understand what's going on here?
BCE
--"Your punctuation skills are insufficient!"
Re: Am I on the pipe, or what?
by Zaxo (Archbishop) on Jul 09, 2002 at 23:15 UTC
|
Perl is seeing the pipe as alternation to nothing, i.e. 'either this or anything else'. Here are the details from a debugging perl (5.8.0-RC1):
$ perl -Dr pipe.pl
Omitting $` $& $' support.
EXECUTING...
Compiling REx `lo/gos'
size 4 Got 36 bytes for offset annotations.
first at 1
rarest char / at 2
1: EXACT <lo/gos>(4)
4: END(0)
anchored `lo/gos' at 0 (checking anchored isall) minlen 6
Offsets: [4]
1[6] 0[0] 0[0] 7[0]
Guessing start of match, REx `lo/gos' against `lo/gou'...
Did not find anchored substr `lo/gos'...
Match rejected by optimizer
Freeing REx: `lo/gos'
Compiling REx `lo/gou'
size 4 Got 36 bytes for offset annotations.
first at 1
rarest char / at 2
1: EXACT <lo/gou>(4)
4: END(0)
anchored `lo/gou' at 0 (checking anchored isall) minlen 6
Offsets: [4]
1[6] 0[0] 0[0] 7[0]
Guessing start of match, REx `lo/gou' against `lo/gou'...
Found anchored substr `lo/gou' at offset 0...
Guessed: match at offset 0
lo/gou
Freeing REx: `lo/gou'
Compiling REx `lo/gw|'
size 7 Got 60 bytes for offset annotations.
1: BRANCH(5)
2: EXACT <lo/gw>(7)
5: BRANCH(7)
6: NOTHING(7)
7: END(0)
minlen 0
Offsets: [7]
0[0] 1[5] 0[0] 0[0] 6[1] 6[0] 7[0]
Matching REx `lo/gw|' against `lo/gou'
Setting an EVAL scope, savestack=16
0 <> <lo/gou> | 1: BRANCH
Setting an EVAL scope, savestack=22
0 <> <lo/gou> | 2: EXACT <lo/gw>
failed...
0 <> <lo/gou> | 6: NOTHING
0 <> <lo/gou> | 7: END
Match successful!
lo/gw|
Freeing REx: `lo/gw|'
Compiling REx `lo/gon'
size 4 Got 36 bytes for offset annotations.
first at 1
rarest char / at 2
1: EXACT <lo/gon>(4)
4: END(0)
anchored `lo/gon' at 0 (checking anchored isall) minlen 6
Offsets: [4]
1[6] 0[0] 0[0] 7[0]
Guessing start of match, REx `lo/gon' against `lo/gou'...
Did not find anchored substr `lo/gon'...
Match rejected by optimizer
Freeing REx: `lo/gon'
After Compline, Zaxo | [reply] [d/l] |
|
use strict;
use YAPE::Regex::Explain;
print YAPE::Regex::Explain->new(qr'lo/gw|')->explain();
__OUTPUT__
The regular expression:
(?-imsx:lo/gw|)
matches as follows:
NODE EXPLANATION
----------------------------------------------------------------------
(?-imsx: group, but do not capture (case-sensitive)
(with ^ and $ matching normally) (with . not
matching \n) (matching whitespace and #
normally):
----------------------------------------------------------------------
lo/gw 'lo/gw'
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------
-- flounder | [reply] [d/l] |
Regex without 'm' or '/'
by dvergin (Monsignor) on Jul 10, 2002 at 01:30 UTC
|
The example that BorgCopyeditor supplies with his question brings up an interesting point. Why does this work:
if($query=~$term) {...}
Or, to offer a very plain example, why does this work:
if ( 'abc' =~ 'bc' ) {
print "yes\n";
}
The rule I learned was (quoting from perlop): "If "/" is the delimiter then the initial m is optional."
Fair enough. The implication is:
if ( 'abc' =~ m/bc/ ) { # good
if ( 'abc' =~ /bc/ ) { # good
if ( 'abc' =~ m%bc% ) { # good (for any non-alphanum)
but:
if ( 'abc' =~ 'bc' ) { # BAD! (or so we assume)
But it is not so. As I said above, both of the examples at the top of this response work. Why?
What I cannot find in the on-line docs (update: see danger's response below) but I know from experimentation and the words of Camel 3, page 144, is that, even without the 'm' or the '/', the righthand side of =~ "still counts as a m// matching operation, but there'll be no place to put any trailing modifiers, and you'll have to handle your own quoting."
So...
if ( 'abc' =~ 'bc' ) { # works
if ( 'abc' =~ $pattern ) { # works
if ( 'abc' =~ "$pattern" ) { # works
if ( 'abc' =~ bc ) { # works!!
if ( 'abc' =~ 'bc'g ) { # Error: bareword 'g'
The present writer is not responsible for any sideways looks any of these may earn you from your peers.
------------------------------------------------------------
"Perl is a mess
and that's good because the
problem space is also a mess." - Larry Wall
| [reply] [d/l] [select] |
|
...
If the right argument is an expression rather than a
search pattern, substitution, or transliteration, it is
interpreted as a search pattern at run time. This can be
less efficient than an explicit search, because the pat-
tern must be compiled every time the expression is evalu-
ated.
This can sometimes cause problems for newcomers, especially when
they use split with a double-quoted string as the split pattern
(as seems to happen with undue frequency) and have an escaped
metacharacter in the pattern:
$_ = 'this has a | pipe';
@a = split /\|/; # good
print join(":", @a),"\n";
@a = split "\|"; # oops
print join(":", @a),"\n";
In the second case, the double-quoted string is first evaluated and
the *resulting* string (sans backslash) is then used as the pattern
in the regex.
| [reply] [d/l] [select] |
|
If the right argument is an expression rather than a
search pattern, substitution, or transliteration, it is
interpreted as a search pattern at run time. This can be
less efficient than an explicit search, because the pat-
tern must be compiled every time the expression is evalu-
ated.
This perlop text must be a holdover from a while back. At least as far back as 5.005_03, code like $str1 =~ 'bc' (with a constant string for the pattern) would be compiled only once. Between 5.6.0 and 5.6.1 an extra check was added, so that even $str =~ $str2 would not be recompiled as long as $str2 had not changed.
I guess the second statement should simply be deleted from that paragraph.
Hugo
| [reply] [d/l] [select] |
|
Thanks to Texas Tess, Zaxo, and dvergin for patient explanations of both the obvious and the arcane. Also, next time I have a regex problem, maybe I'll brave the debugger. That was very enlightening.
FWIW, the data I'm parsing is in 'betacode', an ASCII transcription scheme for Ancient Greek. It's convenient in some ways, but chock full of what I still have to remind myself are metacharacters. Grrr.
BCE --Your punctuation skills are insufficient!
| [reply] |
Re: Am I on the pipe, or what?
by TexasTess (Beadle) on Jul 09, 2002 at 23:14 UTC
|
You have to escape the metacharacter to ensure it's not evaluated literally...but that really does not explain why it picks up the w as well....try escaping the pipe and see if it still returns the same..
TexasTess "Great Spirits Often Encounter Violent Opposition From Mediocre Minds" --Albert Einstein
UPDATE: After re-reading this..I think I must have been on the pipe myself when i wrote it! | [reply] |
|
|