Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

$1, etc. are not just strings

by ysth (Canon)
on Jan 01, 2007 at 04:28 UTC ( [id://592424]=perlmeditation: print w/replies, xml ) Need Help??

I ran into a gotcha the other day; I noticed data "laundered" through a regex suddenly took more space:
use Test::More "no_plan"; use Devel::Size "total_size"; my $key = "aa"; my $val = "a00"; my %hash1; my %hash2; while (length($key) == 2) { $hash1{$key} = $val; "$key$val" =~ /(..)(...)/ and $hash2{$1} = $2; ++$key; ++$val; } is(keys(%hash1), keys(%hash2), "same number of keys"); is_deeply(\%hash1, \%hash2, "is_deeply same"); is(total_size(\%hash1), total_size(\%hash2)); __END__ ok 1 - same number of keys ok 2 - is_deeply same not ok 3 # Failed test (sizer.pl at line 18) # got: '39316' # expected: '58244' 1..3 # Looks like you failed 1 test of 3.
and had to thump myself with a cluestick when I realized why. $2 is a magic variable that fetches the 2nd capture group from the last matched regex in scope. But that magic comes at a cost in storage; as vaguely shown in http://search.cpan.org/perldoc/B#SV-RELATED_CLASSES, a magic variable (a PVMG or subclass thereof) has fields to not only store a string, but also an integer, a floating point value, and in addition, a pointer to a list of magic that applies to this variable. And when perl does an assignment, the target scalar is upgraded to allow it to store at least as much info as the source scalar could, whether or not it actually needs to. So, in my example, all the hash values are also PVMG's, though only the PV (string) fields are actually used.

Making $2 be "$2" makes the hash values simple PV types, and makes all tests pass.

Replies are listed 'Best First'.
Re: $1, etc. are not just strings
by syphilis (Archbishop) on Jan 01, 2007 at 12:30 UTC
    Excellent post - and I ++'ed it some time ago. The main thing I liked (apart from the fact that you provided a fully functional demo) was the use of Devel::Size - which is a module that I had not installed (and was not really aware of).

    I've since installed it, and it looks to be a module that I'll make use of in the future.

    However, my real motivation in replying is simply to see whether I can get 4-in-a-row on the "Notes" list :-)

    Cheers,
    Rob
Re: $1, etc. are not just strings
by diotalevi (Canon) on Jan 02, 2007 at 01:13 UTC

    In Perl 6...

    my Str %hash{Str};

    ⠤⠤ ⠙⠊⠕⠞⠁⠇⠑⠧⠊

Re: $1, etc. are not just strings
by ikegami (Patriarch) on Jan 04, 2007 at 16:49 UTC

    That reminds me of the danger of passing a var to a function that modifies that var. It's particularly easy to screw up when dealing with built-in globals variables such as $1.

    sub func { 'b' =~ /(.)/; print(@_, "\n"); } 'a' =~ /(.)/; func($1); # Prints "b" 'a' =~ /(.)/; func("$1"); # Prints "a"
    sub func { eval { die "bar\n" }; print(@_); } eval { die "foo\n" }; func($@); # Prints "bar" eval { die "foo\n" }; func("$@"); # Prints "foo"

    (If you want to mess with your mind, try removing the second 'a' =~ /(.)/; from the first snippet.)

Re: $1, etc. are not just strings
by gam3 (Curate) on Jan 04, 2007 at 16:36 UTC
    I like to use this syntax to capture from regular expresions.
    if (my ($a, $b) = ("$key$val" =~ /(..)(...)/)) { $hash2{$a} = $b; }
    #!/usr/bin/perl use Test::More "no_plan"; use Devel::Size "total_size"; my $key = "aa"; my $val = "a00"; my %hash1; my %hash2; while (length($key) == 2) { $hash1{$key} = $val; if (my ($a, $b) = ("$key$val" =~ /(..)(...)/)) { $hash2{$a} = $b; } ++$key; ++$val; } is(keys(%hash1), keys(%hash2), "same number of keys"); is_deeply(\%hash1, \%hash2, "is_deeply same"); is(total_size(\%hash1), total_size(\%hash2)); __END__ ok 1 - same number of keys ok 2 - is_deeply same ok 3
    -- gam3
    A picture is worth a thousand words, but takes 200K.
Re: $1, etc. are not just strings
by OfficeLinebacker (Chaplain) on Jan 18, 2007 at 02:42 UTC

    gam3 demonstrates a great method. ++. Perl Best Practices advocates avoiding using $1, $2, etc. whenever you possibly can, and that is a great way to do it. IMO it's sort of a "poor man's" named capture (one of the few regex features that Perl "lacks"), and you never have to refer to the numbered variables themselves except in some special cases like in the replacement clause of the substitution operator.

    I am also particularly fond of a one-step syntax to get the results of a substitution on a variable without affecting the original variable. It's hard to explain but I'm sure you've either done it or wanted to do it at some point. Here's an example (which I apologize is less than ideal; I wasn't sure if any spaces would turn up in the match from the split):

    use strict; use warnings; my @staff = `whoare -g somegroup`; #chomping an array chomps all the elements chomp @staff; foreach my $emp (@staff) { my @fields = split /\s{2,}/, $emp; die "error! number of fields is " . scalar @fields . "! " unless ( scalar @fields == 3 ); #dispense with the opening paren to get the group ( my $def_grp = $fields[1] ) =~ s/\s*\(\s*//; #etc. }
    I think of both of those tricks as falling in the same category, though I use the first more often. It came in handy in the very first lines of a CGI script I wrote:
    use strict; use warnings; # First things first. If I can't figure out who you are, you're outta + here. my $editor; if ( exists( $ENV{REMOTE_USER} ) ) { ($editor) = $ENV{REMOTE_USER} =~ m/(some regex)/i; } (defined $editor) || die "I can't determine who you are, so you can't +access this area."; #use editor later throughout the program
    These examples are probably not the best but it's from code I could put my hands on in short order.

    While we're on the topic, I should note that it's a very good idea to use non-capturing parentheses(?:) whenever you want or need to use parentheses but don't need to remember the stuff that matched in there.


    I like computer programming because it's like Legos for the mind.

      (?<taunt>My 5\.9\.5 perl has named captures. ;-\))

      ⠤⠤ ⠙⠊⠕⠞⠁⠇⠑⠧⠊

        No kidding!? I figured it would be a feature in Perl6, but I didn't know it was already out. We're on 5.8.5 at our site and when I asked for an upgrade the guy who's in charge of the sitewide perl installs said he'd prefer to wait for 5.9 to become stable. He says there is not much of a payoff to move to a higher 5.8.x release.

        This was last month...5.9 (with named captcha) isn't stable by any chance, is it?

        I like computer programming because it's like Legos for the mind.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlmeditation [id://592424]
Approved by Old_Gray_Bear
Front-paged by bart
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others admiring the Monastery: (6)
As of 2024-04-23 18:21 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found