Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

Two Questions on "my"

by C_T (Scribe)
on May 22, 2004 at 15:19 UTC ( [id://355585]=perlquestion: print w/replies, xml ) Need Help??

C_T has asked for the wisdom of the Perl Monks concerning the following question:

Hello all! The following questions assume that "use strict" is enabled.

Question 1:
In general, I would think it is better to do this:

my($foo); for ($i = 0; $i < $a_huge_value; $i++) { $foo = getValue(); codeThatUsesFoo($foo); }

Than this:

for ($i = 0; $i < $a_huge_value; $i++) { my $foo = getValue(); codeThatUsesFoo($foo); }

Because in the first case you're reusing the same memory space over and over, and in the second you're creating memory space in every iteration.

True, or not so much? I'm just looking for a standard practice kind of thing here.

Question 2:

Is there a way to do this:

my($key); my($value); for ($i = 0; $i < $a_whole_lot; $i++) { my($key, $value) = returnsAnArray(); }
Without creating locally-scoped copies of $key, and $value? In other words, I'm using "my" here to create the anonymous array, but I'm ending up recreating locally-scoped $key and $value at the same time, which I'd rather not do for the reasons outlined in question 1. Other than making the array named rather than anonymous, is there something I'm missing?

Apologies all around for these beginner questions.

CT

Charles Thomas
Madison, WI

Replies are listed 'Best First'.
Re: Two Questions on "my"
by Joost (Canon) on May 22, 2004 at 15:37 UTC

    Answer1

    In both cases you reuse the same memory space. lexicals only use more memory in case of a recursive subroutine call, but then, you usually want them too.

    Anyway, generally you want variables to have the smallest possible scope, so I'd go for option two.

    Answer2

    I really don't know what you are talking about, and I don't get the example code either.

    You cannot "return an array", you always return a list (when the sub is called in list context) or a single scalar value (otherwise). If you want to return a long list efficiently, return an array reference, and never mind copying the reference. If you are worried about the memory that's being consumed by 2 scalars, you should probably be coding in assembly.

      > If you are worried about the memory that's being consumed by 2 scalars
      >you should probably be coding in assembly.

      What I'm actually worrying about is the TIME necessary to create the memory space versus the time it takes to simply assign a new value into existing memory space.

      That whole "this will take a few seconds" from the next response? I'm wondering if that time could be cut down by reusing the memory space.

      Some testing has shown me that it's about 1.5x as much time to do method 2 than method 1.

      Charles Thomas
      Madison, WI

        In general, when you have questions like this the real answer is to try it. The Benchmark module is your friend:

        use Benchmark qw/ cmpthese /; cmpthese(-5, { outside => q{ my $scalar; for (1..10_000) { $scalar = 1; } }, inside => q{ for (1..10_000) { my $scalar = 1; } }, });
        gives me:
        Rate inside outside inside 261/s -- -15% outside 307/s 18% --

        I'm not sure why the 'outside' version is faster, and it intrigues me - but if all you care about is which is faster, you don't need to know.

        Be careful about reading the percentages without noticing the rate though - since the assignment is happening 10,000 times for each iteration, we're talking about a 15% difference in speed for an operation that takes about a third of a microsecond. That is, you need to perform such an assignment about 18 million times before you can save 1 second by declaring the variable outside the loop. In most cases the clarity achieved by declaring the variable at the innermost scope far outweighs such microscopic savings.

        Hugo

Re: Two Questions on "my"
by Zaxo (Archbishop) on May 22, 2004 at 15:37 UTC

    Much of the work you're trying to avoid is done anyway when the braces define a scope. Generally, the idea is to use scoping to get rid of knowledge where it's unneeded. I'd use the second form in question one because $foo is then undefined afterwards. Similarly for question two, get rid of the declaration outside the loop.

    If you must, you can make the exterior declaration and then say ($key, $value) = returnsAnArray(); inside the loop. I think the loops would be better written as { my $i = 0; while ($i++ < $a_big_number) { ... }} rather than the C-style for loops you have. You could also say for (0..$a_big_number) { ... }, but that is no way to save memory.

    After Compline,
    Zaxo

      > You could also say for (0..$a_big_number) { ... }, but that is no way to save memory.

      Actually, it is. For some time now that style of foreach loop has been optimized. It doesn't create a list of $a_big_number+1 elements; it efficiently iterates one at a time through the set, much as the equivalent C-style for loop would.

      You can prove this to yourself by using a really big number and watching the memory of the program as it runs (say, through top):

      foreach (0..100_000_000) { $i++ }

      Compare this with the memory usage of something like the following. Notice I had to drop the number from 100 million to just one million; 100 million caused perl to die with an out of memory error.

      @array = (0..1_000_000); foreach (@array) { $i++ }

      This optimization was added to perl 5.005; perldoc perl5005delta mentions it, search for 1000000.

Re: Two Questions on "my"
by Jenda (Abbot) on May 22, 2004 at 18:37 UTC

    Premature optimization is the root of all evil.

    Re 1:

    use strict; use Benchmark; my $a_huge_value = 10000000; sub outside { my($foo); for (my $i = 0; $i < $a_huge_value; $i++) { $foo = $i + 1; $foo++; } } sub inside { for (my $i = 0; $i < $a_huge_value; $i++) { my $foo = $i + 1; $foo++; } } timethese 1, { inside => \&inside, outside => \&outside, }; __END__ Benchmark: timing 1 iterations of inside, outside... inside: 6 wallclock secs ( 5.99 usr + 0.00 sys = 5.99 CPU) @ 0 +.17/s (n=1) outside: 5 wallclock secs ( 4.95 usr + 0.01 sys = 4.96 CPU) @ 0 +.20/s (n=1)
    I don't think the difference is big enough. Especialy since the loops were real tight in the benchmark. If you actually call a subroutine in the loop you get much closer results. If I relace the $foo++ by a call to sub doSomething { $_[0]++ } I get:
    Benchmark: timing 1 iterations of inside, outside... inside: 12 wallclock secs (10.95 usr + 0.00 sys = 10.95 CPU) @ 0 +.09/s (n=1) outside: 10 wallclock secs ( 9.49 usr + 0.02 sys = 9.51 CPU) @ 0 +.11/s (n=1)
    Anyway if you really do care about this tiny difference I would suggest a slightly different syntax:
    sub inInit { for (my ($i, $foo) = (0, 0); $i < $a_huge_value; $i++) { $foo = $i + 1; $foo++; } }
    This way the $foo is created just once, yet it's declared just for the loop. You should be aware though that under some circumstances it may make a huge difference whether the variable is declared outside or inside the loop. If you plan to keep a reference to the variable you do need to declare it inside the loop so that you do get a new variable in each iteration!

    Re 2: First, you do not have any localy scoped variables in your code. You have a few LEXICALY scoped ones. Anyway all you need to notice is that

    for ($i = 0; $i < $a_whole_lot; $i++) { ($key, $value) = returnsAnArray(); }
    is valid Perl. You do not need the declaration to be allowed to assign to a list of variables! So there is actually no difference between this and the first question.

    Jenda
    Always code as if the guy who ends up maintaining your code will be a violent psychopath who knows where you live.
       -- Rick Osborne

    Edit by castaway: Closed small tag in signature

Re: Two Questions on "my"
by davido (Cardinal) on May 22, 2004 at 15:47 UTC
    A really simple test:

    foreach ( 1 .. 10000000 ) { my $i = 1; } print $i, "\n";

    Run this test. It will take a few seconds, so don't panic. You'll notice that despite creating $i ten million times, you're not actually using ten million times more memory than had you declared $i outside the loop. This is because $i falls out of scope and gets garbage collected each time the loop iterates. If Perl failed to reclaim that memory, you would start seeing a lot of swapfile activity resulting in hard-drive churning, as your internal memory gets saturated and your operating system begins looking to the swapfile for virtual memory. This isn't happening though, because you're reusing memory.

    You'll also notice that you're unable to print $i, after exiting the loop. This also is because $i is no longer in scope, and has been garbage collected.... nothing left to see.


    Dave

      > This is because $i falls out of scope and gets garbage collected each time the loop iterates.

      perl is actually a bit smarter than that. The $i that is used on each iteration is the same $i; the value is simply reinitialized on each iteration.

      You can prove this to yourself by seeing what memory address the scalar has:

      for (1..100) { my $i = 1; print \$i, "\n"; }

      However, this only works as long as the variable would be garbage-collected at the end of scope. If the reference count is higher than 1 at end of scope $i is a whole new scalar on each iteration. Observe:

      for (1..10) { my $i = 1; push(@is, \$i); print "in loop: ", \$i, "\n"; } print "outside loop: $_\n" for @is;
Re: Two Questions on "my"
by pbeckingham (Parson) on May 22, 2004 at 16:21 UTC

    Answer 1
    I would use the second variation, because it hides $foo from the rest of the code, eliminating the possibility of side-effects from having $foo still visible. The code generated by this Perl reuses your scoped $foo anyway as an optimization, so there is no penalty.

    Note that my ($foo); could be more clearly represented as my $foo; - the parens are usually used in declarations to collapse several lines into one, such as:

    my ($a, $b, $c);
    or
    my ($a, $b, $c) = (0, 1, 2);
    or to capture a subroutine's arguments
    my ($a, $b) = @_;

    Answer 2
    The declaration of $key and $value both outside and inside the loop is redundant, but I see nothing wrong with your code inside the loop, other than the fact that the loop control variable $i is not referenced in the body, so the loop could be rewritten as:

    for (1 .. $a_whole_lot) { my ($key, $value) = returnsAnArray (); }
    Again, the scoped variables are going to be reused.

    Aside
    In question 1, your code doesn't even need a $foo.

    codethatUsesFoo (getValue()) for 1 .. $a_huge_value;
    In general, you are using the C-style for loop, and it would be more Perlish of you to use for or foreach as shown.

    Hope this helps.

      In question 1, your code doesn't even need a $foo.
      codethatUsesFoo (getValue()) for 1 .. $a_huge_value;
      While I agree with your sentiment regarding the more perlish for loop, you have slightly changed the semantics by placing the call to getValue() within the parameter list of codethatUsesFoo(). The orginal code had getValue in scalar context, while yours has it in list context. This may or may not make a difference in reality, but it's always important to be aware of such subtlties IMHO.
Re: Two Questions on "my"
by bobn (Chaplain) on May 22, 2004 at 18:10 UTC
    I don't know where you get a 50% hit for the 'my' inside the for loop. When I do this:
    $a_huge_value = 3200000; my($foo); $time1 = time(); for ($i = 0; $i < $a_huge_value; $i++) { $foo = getValue(); codeThatUsesFoo($foo); } $time2 = time(); for ($i = 0; $i < $a_huge_value; $i++) { my $foo = getValue(); codeThatUsesFoo($foo); } $time3 = time(); print $time2 - $time1, "\n"; print $time3 - $time2, "\n"; sub getValue { return $_[0]; } sub codeThatUsesFoo { return $_[0]; }
    I get:
    [bobn@trc2:/home/bobn/misc]# perl my2.pl 14 14 [bobn@trc2:/home/bobn/misc]# perl my2.pl 13 15 [bobn@trc2:/home/bobn/misc]# perl my2.pl 13 15 [bobn@trc2:/home/bobn/misc]# perl my2.pl 14 16
    Changing the subs to:
    sub getValue { return } sub codeThatUsesFoo { return }
    yields:
    [bobn@trc2:/home/bobn/misc]# perl my2.pl 12 12 [bobn@trc2:/home/bobn/misc]# perl my2.pl 12 13 [bobn@trc2:/home/bobn/misc]# perl my2.pl 13 13 [bobn@trc2:/home/bobn/misc]# perl my2.pl 12 13
    SO the decision is really based on what it *should* be based on - is $foo intended to be used outside the loop, with the effects of the loop desired to be visible. If so, do it the first way. If not, do it the second way.

    --Bob Niederman, http://bob-n.com

    All code given here is UNTESTED unless otherwise stated.

Re: Two Questions on "my"
by gmpassos (Priest) on May 22, 2004 at 23:27 UTC
    In the question 1, the only difference to declare $foo outside of the loop, or inside is not the memory, because Perl will detect that and will reuse the same memory (SV), is the cleanning process for each loop. If you declare outside you only write on $foo one time per loop (when getValue return something). But if you declare $foo inside the for, for each loop the same SV need to be "created" and cleanned. But note that in both cases the SV will be always in the same address.

    With a simple test you can see that declare $foo outside of the loop is a little faster, but you will only win speed if you have this loop working for more that 100.000 in your program, and is a little of speed.

    Graciliano M. P.
    "Creativity is the expression of the liberty".

      As proof that they're the same address:

      #!/usr/bin/perl -wl for (1..5) { my $i = rand(); print \$i; }
Re: Two Questions on "my"
by Dr. Mu (Hermit) on May 24, 2004 at 05:29 UTC
    I was going to answer question one by suggesting that it might be more useful to think of my as a compile-time scoping contruct rather than something that actually gets executed. So I wrote some sample code to illustrate my point. I guess it was a useful excercise, because now I'm confused! Here are the three scripts and their outputs:
    use strict; foreach (0 .. 10) { my $foo; $foo = 0 unless $_; print $foo++, ' '; } 0 0 0 0 0 0 0 0 0 0 0 use strict; foreach (0 .. 10) { my $foo = 0 unless $_; print $foo++, ' '; } 0 0 1 2 3 4 5 6 7 8 9 use strict; foreach (0 .. 10) { my $foo = 0 unless $_; print $foo, ' '; $foo++ } 0 1 2 3 4 5 6 7 8 9
    Okay, I was wrong about the not-getting-executed stuff. But can anyone explain what's going on with the second and third examples? It's almost as if the scope of the my were restricted to the statement it's used in and that the $foo getting printed and incremented is a different variable -- at least after the first time through the loop (if that even makes sense). But if that were the case, why didn't strict complain?

      I'm not sure if I've got some sort of mental filter that spots these, or if it's a commonly encountered problem, but this is the third time in a week someone has asked this question.

      The answer is that my has both a compile- and a run-time effect. The compile-time effect allocates space for the variable and lets it pass strict. The run-time effect initializes the variable. If you conditionalize the initialization you have, in effect, a static variable; one that retains its value beyond scope exit.

      I've explained it in more detail, with a documentation reference, here, and it has been covered in a thread here.

        Yup. This is the kind of stuff that makes my Python buddies wince when they regard my affinity for Perl. But their desks are tidy. Mine is messy. There's obviously a connection.

        Thanks, Somni!

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://355585]
Approved by dorko
Front-paged by sulfericacid
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others lurking in the Monastery: (4)
As of 2024-04-24 01:06 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found