Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

How foreach loops decide what to iterate through

by lefthanded (Novice)
on Feb 13, 2009 at 22:23 UTC ( #743725=perlquestion: print w/replies, xml ) Need Help??

lefthanded has asked for the wisdom of the Perl Monks concerning the following question:

Hi monks, I'm fairly new to Perl, and I have a basic question about how foreach loops work. Say you want to iterate through an array, @a, in some particular order, but @a is not yet sorted in that order. Would it be better (i.e., faster) to sort @a beforehand, and then use it in the loop, or is it fine to sort in-line with the loop? Basically, if I don't care about preserving the current order of @a, which of these two methods is preferred:

Method 1:
@a = sort @a; foreach (@a) { ... }
Method 2:
foreach (sort @a) { ... }
I prefer Method 2 simply because it's slightly less code, but I'm concerned that maybe it's re-sorting @a on each iteration, which would be bad. I notice that the following code runs forever:
my @a = (1); foreach (@a) { push @a, 1; }
So the loop doesn't fix at its start what it's going to iterate through, which is what makes me think Method 2 might be inefficient. Although maybe the loop decides that it's going to iterate through @a, whatever that may be, whether it changes or not within. So it is fixed in a sense, in which case Method 2 is fine I think, although really now I'm just more confused.

Thanks for any insight into how this actually works.

Replies are listed 'Best First'.
Re: How foreach loops decide what to iterate through
by Lawliet (Curate) on Feb 13, 2009 at 23:27 UTC

    Welcome to The Monastery, and thank you for writing your node.

    I have little experience with the Benchmark module, but after conducting some tests, I would say that you should use the latter method, the one you prefer. It is more concise, easier to read, and a little more efficient. (thanks for the notice, ikegami.)

    #!/usr/bin/perl use warnings; use strict; use 5.010; use Benchmark qw(:all); use Data::Dumper; cmpthese(100000, { 'LowOutside' => sub { my @array = qw(8 1 3 5 6 2); @array = sort @array; foreach (@array) { my $num = $_; } }, 'LowInside' => sub { my @array = qw(8 1 3 5 6 2); foreach (sort @array) { my $num = $_; } }, 'Outside' => sub { my @array = qw(8 1 3 5 6 2) x 100; @array = sort @array; foreach (@array) { my $num = $_; } }, 'Inside' => sub { my @array = qw(8 1 3 5 6 2) x 100; foreach (sort @array) { my $num = $_; } } }); __END__ Rate Outside Inside LowOutside LowInside Outside 1681/s -- -1% -99% -99% Inside 1691/s 1% -- -99% -99% LowOutside 169492/s 9985% 9924% -- -5% LowInside 178571/s 10525% 10461% 5% --

    And you didn't even know bears could type.

      I would say that you should use the latter method, the one you prefer. It is more concise, easier to read, and a little more efficient.

      1% and even 5% is within the margin or error of Benchmark, so that last phrase is inaccurate.

      I'm actually surprised they're the same speed. @a = sort @a; is optimized to be done in place, and I think for (@a) is optimized over the more general for (LIST) (it's definitely implemented differently).

Re: How foreach loops decide what to iterate through
by jethro (Monsignor) on Feb 13, 2009 at 23:34 UTC

    Both methods are practically the same in terms of speed. When you do foreach (sort @a), the foreach doesn't loop over @a but instead over the anonymous (and temporary) array that was returned from the sort function.

    The only difference of the two methods is that the first method stores the (anonymous) sorted array into @a before going into the foreach loop, while the second method leaves the sorted array anonymous.

      That's clear and understandable, and what I would have thought too. (</metoo> ;-)

      But I am still puzzled as to what is happening in the example of

      my @a = (1); foreach (@a) { push @a, 1; }

      Ceci n'est pas une signature

        You are looping over an array you are adding to, so you are never reaching the end.

        This is somewhat similar to the following loop

        my $i=1; my $end= 2; while ($i++<$end) { $end++; }

        Note that

        y @a = (1); foreach (sort @a) { push @a, 1; }

        stops immediately, because here you are looping over the anonymous array that is not expanding.

        Essentially what you are doing is something like this:

        my @a = (1); for ( my $i = 0 ; $i <= $#a ; $i++ ) { push( @a, 1 ); }

        So the loop isn't being restarted or re-evaluated, but because you are extending the thing it's iterating over, it never gets to the end.


        www.jasonkohles.com
        We're not surrounded, we're in a target-rich environment!
Re: How foreach loops decide what to iterate through (push array inside foreach)
by toolic (Bishop) on Feb 14, 2009 at 02:10 UTC
    I notice that the following code runs forever:
    Adding a print inside the loop might shine some light on it:
    foreach (@a) { push @a, 1; print scalar @a, "\n" }

    Prints:

    2 3 4 5 ...
    Apparently, the push causes the array to grow by one element, somehow triggering the loop to iterate again.

    I found this in the docs for Foreach Loops:

    If any part of LIST is an array, foreach will get very confused if you add or remove elements within the loop body, for example with splice. So don't do that.

    Now I am curious as to what perl is doing internally. Can any monks offer a deeper explanation?

      Now I am curious as to what perl is doing internally. Can any monks offer a deeper explanation?

      I think it iterates over the array instead of putting it on the stack. You can circumvent the optimisation by changing for (@a) to for ((),@a).

      I don't know, but my guess is that perl simply keeps a pointer to the array-element it is at in the loop. I did a few tests with splice and perl did exactly what I expected from it. It looped over the array positions (!! not the array values) until it was outside the array length (because of looping or a shrinking of the array).

      Would be nice if someone could show me a case where perl gets confused. I could not create one

        Would be nice if someone could show me a case where perl gets confused. I could not create one

        Confused by addition:

        >perl -wle"@a=1; for ((),@a) { push @a,$_+1; print }" 1 >perl -wle"@a=1; for (@a) { push @a,$_+1; print }" 1 2 3 4 ...

        Confused by deletion:

        >perl -wle"@a=(1,2); for (@a) { pop @a; print qq{<$_>} }" <1> >perl -wle"@a=(1,2); for ((),@a) { pop @a; print qq{<$_>} }" <1> Use of freed value in iteration at -e line 1.
Re: How foreach loops decide what to iterate through
by bichonfrise74 (Vicar) on Feb 13, 2009 at 23:10 UTC
    I would think both methods would just be the same and so I would prefer method 2 since it has less clutter.
    foreach (sort @a) { ... }
Re: How foreach loops decide what to iterate through
by ysth (Canon) on Feb 15, 2009 at 18:50 UTC
    In principle, foreach iterates over a list that is constructed before the looping begins, so you should have no concerns about making extra work for each iteration by having a complex expression for the list being iterated over.

    In practice, iterations over a range (foreach ($x .. $y)) or an array (even one built on the fly: foreach (@{[ sort @x ]})) are optimized to not need to load the full list ahead of time. Since this optimization shouldn't introduce differences in behaviour, what you are seeing with a push in the loop is a bug.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://743725]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others musing on the Monastery: (6)
As of 2022-05-25 09:13 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Do you prefer to work remotely?



    Results (90 votes). Check out past polls.

    Notices?