Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

Pattern matching with qr// and modifiers

by Athanasius (Archbishop)
on May 05, 2012 at 03:06 UTC ( [id://969011]=perlquestion: print w/replies, xml ) Need Help??

Athanasius has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks,

Please see update below.

While doing pattern matches using qr// with modifiers added, I ran into behaviour I didn’t expect. The following code, using the 's' modifier to

“change "." to match any character whatsoever, even a newline, which normally it would not match.” (perlre: Modifiers)
illustrates what I mean:

#! perl use strict; use warnings; my $text = <<EOT; The quick brown fox jumps over the unfortunate dog. EOT my %searches = ( 'A' => 'fox.+?jumps', 'B' => '(?s)fox.+?jumps' ); my %functions = ( '1. no' => \&no_mod, '2. qr//' => \&qr_mod, '3. late' => \&late_mod, '4. inline' => \&inline_mod ); foreach my $search_key (sort keys %searches) { print "\nSearching on '$searches{$search_key}':\n\n"; foreach my $fn_key (sort keys %functions) { printf "%s%-9s 's' modifier: %s\n", $search_key, $fn_key, $functions{$fn_key}->($text, $searches{$search_key}) ? + 'Match found' : 'No match'; } } sub no_mod # no 's' modifier { my ($string, $pattern) = @_; my $regex = qr/$pattern/; return ($string =~ /$regex/); } sub qr_mod { my ($string, $pattern) = @_; my $regex = qr/$pattern/s; # <= 's' modifier return ($string =~ /$regex/); } sub late_mod { my ($string, $pattern) = @_; my $regex = qr/$pattern/; return ($string =~ /$regex/s); # <= 's' modifier } sub inline_mod { my ($string) = @_; return ($string =~ /fox.+?jumps/s); # <= 's' modifier } __END__

This is the output I get:

Searching on 'fox.+?jumps': A1. no 's' modifier: No match A2. qr// 's' modifier: Match found A3. late 's' modifier: No match A4. inline 's' modifier: Match found Searching on '(?s)fox.+?jumps': B1. no 's' modifier: Match found B2. qr// 's' modifier: Match found B3. late 's' modifier: Match found B4. inline 's' modifier: Match found

(I’m running DWIM/Strawberry perl 5.14.2 on Vista 32-bit, and I get the same result with perl 5.10.1 on Cygwin.)

All the results are as expected, except for A3. I can’t see any (logical) difference between the match patterns in late_mod() and inline_mod(), yet A4 matches (as expected) but A3 does not.

I’ve looked at perlop: Regexp Quote Like Operators, also perlfaq6: I'm having trouble matching over more than one line. What's wrong? and How do I match a regular expression that's in a variable? , , but I haven’t found anything that addresses this particular issue.

So my questions are:

  1. Is the match in A3 supposed to fail?
  2. If Yes, why?
  3. And then, why doesn’t the use warnings pragma result in a warning along the lines of: Useless use of pattern match modifier 's' on line 53 ?
  4. If No (i.e., the match should succeed), what am I doing wrong?

Update

Since posting, I’ve found How do I apply switches like /i or /g to a qr regexp? which addresses this issue. But I would still appreciate any further information or clarification.

Thanks,

Athanasius <°(((><contra mundum

Replies are listed 'Best First'.
Re: Pattern matching with qr// and modifiers
by jwkrahn (Abbot) on May 05, 2012 at 04:39 UTC
    sub late_mod { my ($string, $pattern) = @_; my $regex = qr/$pattern/; return ($string =~ /$regex/s); # <= 's' modifier }

    If you print out $regex after it has been compiled:

    $ perl -le' my $pattern = q/fox.+?jumps/; my $regex = qr/$pattern/; print $regex; ' (?-xism:fox.+?jumps)

    You will see that because there was no /s option when using qr// that it is turned off for this compiled regular expression.

      Thanks to jwkrahn for the reply.

      Ok, perl 5.10.1 gives me: (?-xism:fox.+?jumps).

      Under perl 5.14.2 I get: (?^:fox.+?jumps). In perlre: Extended Patterns, this is explained as follows:

      Starting in Perl 5.14, a "^" (caret or circumflex accent) immediately after the "?" is a shorthand equivalent to d-imsx.... The caret tells Perl that this cluster doesn't inherit the flags of any surrounding pattern, but uses the system defaults (d-imsx), modified by any flags specified.
      So, in both cases qr// compiles $regex with the 's' modifier turned off.

      I find it surprising that adding the modifier back later has no effect, and triggers no warnings. I guess I’ll just have to remember that once a regex has been compiled with qr//, its d, i, m, s, and x settings are thereafter immutable.

      By the way, perlop: Regexp Quote Like Operators says of qr//:

      This operator quotes (and possibly compiles) its STRING as a regular expression.
      Why “and possibly compiles”? Under what circumstances does qr// quote rather than compile, and what difference would this make?

      Thanks,

      Athanasius <°(((><contra mundum

        I find it surprising that adding the modifier back later has no effect, and triggers no warnings. I guess I’ll just have to remember that once a regex has been compiled with qr//, its d, i, m, s, and x settings are thereafter immutable.
        Well, isn't that the point of having a compiled regexp? Now, if you compile a regexp, you know exactly what you have. Otherwise, if I have a compiled regexp, I still don't know what it does, because modifiers can be applied. It also means that if you would do something like:
        my $pat = gimme_pat(); # Some method returning a qr if ($str =~ /^ $pat $ # Anchor pattern/x) { ... }
        will break if $pat uses spaces.
        I find it surprising that adding the modifier back later has no effect, and triggers no warnings.

        It does have an effect. The effect is on the pattern you add it to though, not on the compiled pattern. The following illustrates what I mean:

        $ perl -le '$r = qr/qux./; $s = "qux\n"; print "matches" if $s =~ /$r/ +' $ perl -le '$r = qr/qux./; $s = "quxxx"; print "matches" if $s =~ /$r. +/' matches $ perl -le '$r = qr/qux./; $s = "quxx\n"; print "matches" if $s =~ /$r +./' $ perl -le '$r = qr/qux./; $s = "quxx\n"; print "matches" if $s =~ /$r +./s' matches

        -sauoq
        "My two cents aren't worth a dime.";
        I guess I’ll just have to remember that once a regex has been compiled with qr//, its d, i, m, s, and x settings are thereafter immutable.

        But this is not unusual. To my mind, it is similar to having to remember that the function
            sub func { my $x = 42;  return $x; }
        will always return 42 regardless of any value assigned to any global or lexical  $x scalar in any calling scope of the function. The return value of the function is, in a sense, 'immutable' — unless, of course, the function returns a reference to the lexical, which is a whole 'nother ballgame; but regex modifiers have no truck with such referential semantics. In other words, it is purely a scoping question.

        Why “and possibly compiles”? Under what circumstances does qr// quote rather than compile, and what difference would this make?

        In the discussion of  qr// in the Regexp Quote Like Operators section of perlop, see the paragraphs beginning "Since Perl may compile the pattern..." and "Precompilation of the pattern into an internal representation..." for some light on this question.

        As I understand it, a regex object has both an internal representation (for use by the interpreter at run time) and a 'stringized' representation (for the convenience of the programmer, and for interpolation in strings and other regex objects at compile time), along with a bunch of rules for converting between the two. In the example given in perlop, the  qr/$_/i sub-expression in the
            my @compiled = map qr/$_/i, @$patterns;
        statement cannot possibly be referred to in its stringized form because no 'hard' or 'soft' reference to it is created; it can therefore be compiled directly to an internal representation.

        Similarly, the number 1.2345 has both an internal (IEEE 754, 64-bit) representation (actually, an approximation; see What Every Computer Scientist Should Know About Floating-Point Arithmetic) and a stringized representation (e.g., '1.2345'); if the number is used purely as a number and never printed, it may never exist in its stringized form, but only as its internal representation.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://969011]
Approved by davido
Front-paged by ww
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chilling in the Monastery: (2)
As of 2024-04-24 17:03 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found