Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

m//g behaves strange...

by Anonymous Monk
on Nov 09, 2003 at 21:15 UTC ( [id://305715]=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

During my first steps in perlgolf I've encountered a strange m//g behaviour (perl 5.8.1).

When I run following code:

#!/usr/bin/perl -l $_="123"; @a=/./g; print "#1: ", /./g; $_="123"; $a=/./g; print "#2: ", /./g; $_="123"; /./g; print "#3: ", /./g; $_="123"; /./g; print "#4: ", /\G./g; $_="123"; /./g; undef pos; print "#5: ", /./g; $_="123"; /./g; undef pos; print "#6: ", /\G./g; $_="123"; /./g; pos = 2; print "#7: ", /./g; $_="123"; /./g; pos = 2; print "#8: ", /\G./g;

I get this output:

#1: 123 #2: 23 #3: 23 #4: 23 #5: 123 #6: 123 #7: 3 #8: 3

This test shows that:

  1. In scalar context m//g doesn't return size of the list it would give back when called in list context,
  2. The \G anchor is useless - perlre states, that it should be used only at the beginning of pattern,
  3. and most important - m//g in scalar context doesn't reset pos (matches only once).

I ask you if it's proper behaviour / undocumented feature / bug? Or maybe I have missed something?

PS. My username is kokr but somehow email with my passwd can't find it's way to my mailbox :>

Replies are listed 'Best First'.
Re: m//g behaves strange...
by antirice (Priest) on Nov 09, 2003 at 21:41 UTC

    I think you've missed something.

    1. This is somewhat more of a feature. If you want to extract all matches from a regex, you can do @a=/regex/g and get all the strings that match. If you want the count, you could do $a=()=/regex/g;.
    2. From perlre:
      Perl defines the following zero-width assertions:
      \G - Match only at pos() (e.g. at the end-of-match position of prior m//g)
      In other words, if the regex starts with \G the match has to start at pos or the regex doesn't match.
    3. m//g shouldn't reset in scalar context until it fails. Otherwise you couldn't do loops such as while ($string =~ /regex/g) { ...

    Perl Idioms Explained - @ary = $str =~ m/(stuff)/g by tachyon should help with regexes in list context.

    Hope this helps.

    antirice    
    The first rule of Perl club is - use Perl
    The
    ith rule of Perl club is - follow rule i - 1 for i > 1

Re: m//g behaves strange...
by converter (Priest) on Nov 09, 2003 at 21:54 UTC

    In scalar context m//g doesn't return size of the list it would give back when called in list context,

    If you want the number of elements in the list returned by a pattern with the /g modifier when evaluated in list context, you can use a list assignment in scalar context, which produces the number of its elements. In this case, we assign to an empty list:

    $_ = "456"; $count = () = /./g; print $count; # prints 3

Re: m//g behaves strange...
by Anonymous Monk on Nov 09, 2003 at 21:28 UTC

    Your test output is entirely expected behavior according to the documentation. Perhaps it would be better if you indicated what you expected the output to be.

    1. m//g returns true or false in scalar context.
    2. \G is not useless.
    3. m//g won't reset pos() until the match fails.
Re: m//g behaves strange...
by pg (Canon) on Nov 10, 2003 at 02:59 UTC

    I had recently answered a post, Re: An Insane Typo Bug, and it relates to your question in an interesting way. The original post in that thread has a totally different face with your wonder, but both are about the same fact that, in scalar context, m// returns either 1 or 0.

    For pos(), try this, it gives you 1 and 9, so pos() does work:

    use strict; use warnings; $_ = "0123456789"; my $ret = m/0/g; print "ret = $ret, pos = " . pos() . "\n"; $ret = m/8/g; print "ret = $ret, pos = " . pos() . "\n";
      <pedantic> actually, m// in scalar context returns either 1 or "" (empty string). </pedantic>.

        You are very close to 100% right. However I do observe something else, and I don't let things escape easily.

        If I do this:

        use strict; use warnings; { $_ = "1234"; my $ret = /2/g; print "($ret)\n" } { $_ = "1234"; my $ret = /9/g; print "($ret)\n" }

        The outputs are 1 and "empty string", which indicate that you are right.

        However, try this:

        use strict; use warnings; { my $a = 0; print "(" . ~$a . ")\n"; } { my $a = 1; print "(" . ~$a . ")\n"; } { my $a = ""; print "(" . ~$a . ")\n"; }

        It returns:

        (4294967295) (4294967294) ()

        Remeber the return values for zero and empty string, and then try this:

        use strict; use warnings; { $_ = "1234"; my $ret; print ~ m/2/, "\n"; } { $_ = "1234"; my $ret; print ~ m/9/, "\n"; }

        It gives you:

        4294967294 4294967295

        Which indicates the "~" operator does receive 0, not "empty string". Rememebr that in the case that we explicitly pass "~" an empty string, it is not converted to 0

        However, if we do this:

        use strict; use warnings; { $_ = "1234"; my $ret; print ~ ($ret = m/2/), "\n"; print "($ret)\n"; } { $_ = "1234"; my $ret; print ~ ($ret = m/9/), "\n"; print "($ret)\n"; }

        You get:

        4294967294 (1) 4294967295 ()

        It seems that although $ret receives "empty string", "~" operator receives 0, again rememebr that we didn't see this kind of auto-convertion in the explicitly-passing-empty-string case.

Re: m//g behaves strange...
by Dominus (Parson) on Nov 10, 2003 at 20:01 UTC
    Says kokr:
    In scalar context m//g doesn't return size of the list it would give back when called in list context,
    It's not supposed to do that. m//g in scalar context has a very interesting result:
    my $s = "123 45 6 789"; while ($s =~ m/\d+/g) { print "> $&\n"; }
    This prints:
    > 123 > 45 > 6 > 789
    2.The \G anchor is useless - perlre states, that it should be used only at the beginning of pattern,
    It's not useless. If you change the pattern in the example above to /\G\d+/g you get a different result. But here's a typical example of how one might use \G:
    my $s = "123 carrots 45 6 bananas 789"; while (1) { $s =~ /\G(\d+)/gc and print "NUMBER $1\n" and next; $s =~ /\G\s+/gc and print "SPACE\n" and next; $s =~ /\G([a-z]+)/gc and print "WORD $1\n" and next; $s =~ /\G$/gc and last; }
    This prints:
    NUMBER 123 SPACE WORD carrots SPACE NUMBER 45 SPACE NUMBER 6 SPACE WORD bananas SPACE NUMBER 789
    What happens if you remove the 'useless' \G's? You get a very different result:
    NUMBER 123 NUMBER 45 NUMBER 6 NUMBER 789
    3.and most important - m//g in scalar context doesn't reset pos (matches only once).
    Here you have some confusion, but I can't tell what it is amongst the other confusions in your articles. Did you realize that the /./g in print "#7: ",   /./g; was in list context, not scalar context? Did you realize that assigning $_ = "123" will reset pos($_)? Did you realize that m//g isn't supposed to 'reset' pos unless the match fails? In fact, the whole point of /g is that it does not reset pos. Ordinary matches, without /g, reset pos before matching begins.

    Consider this:

    my $s = "123 carrots 45 6 bananas 789"; while ($s =~ /(\d+)/g) { print "'$1' at position ", pos($s)-length($1), "\n"; }
    The output is:
    '123' at position 0 '45' at position 15 '6' at position 18 '789' at position 29
    So clearly pos is doing something. Now let's reset pos:
    my $s = "123 carrots 45 6 bananas 789"; while ($s =~ /(\d+)/g) { print "'$1' at position ", pos($s)-length($1), "\n"; pos($s) += 13; }
    Now the output is different:
    '123' at position 0 '5' at position 16 '89' at position 30
    The first match is as before. But the pos($s) += 13 forces the current match position forward, into the middle of the 45, so that the next match sees only the 5 part. After matching the 5, the next pos($s) += 13 jumps past the 6 entirely, into the middle of the 789.

    I ask you if it's proper behaviour / undocumented feature / bug? Or maybe I have missed something?
    A combination of all of these, I think. #1 is proper behavior. #2 seems to be a case of your having missed something. #3 is an undocumented feature, but it's undocumented because it doesn't exist. But also the behavior of \G and /g is very badly documented in general.

    I hope this helps, but I'm not sure what your objection is, so I can't address it directly.

    --
    Mark Dominus
    Perl Paraphernalia

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://305715]
Approved by ybiC
Front-paged by broquaint
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others perusing the Monastery: (3)
As of 2024-04-25 17:54 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found