Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

Optimization, side-effects and wrong code

by cbraga (Pilgrim)
on Sep 29, 2004 at 17:49 UTC ( [id://395074]=perlmeditation: print w/replies, xml ) Need Help??

The following code is wrong:
sub find_stuff { my $self = shift; while (my $k = each %$self) { return $k if ($self->{$k}->{blablah}); } return undef; }
It's wrong because the function each has global side-effects: it has a global pointer for each hash it works on, and each time the method is called the loop will continue from the position where it left off previously. As such, at some point it will start after the last "stuff" we want to find and it will say the hash/object has none, though it may be full of it.

I was just bitten by this behaviour in a code that does dependency checks for RPMs — dependencies were being missed and packages failing installation due to missing dependencies.

Clearly Perl needs a way to reset each's pointer (can't be done except via keys, which moots the point). The loop in question was previously done with foreach/keys and I changed it to use each because foreach+keys were so unbearably slow on big hashes — it made a difference from waiting half a second after clicking the checkbox to install a package for the dependencies to be calculated to an almost instantaneous response when each was used.

A good optimization on foreach's performance wouldn't hurt either.

For now I'm having to do with this horrible workaround:

sub find_stuff { my $self = shift; my $first = undef; while (1) { my $r = each %$self; unless ($r) { $r = each %$self; last unless defined $r; } last if defined ($first) && $first eq $r; $first = $r unless defined $first; # do stuff quitting the loop at will } return undef; }
I could also have let the loop run to the end, putting a "next if $done" line in it, but that won't give me any guarantees either — it will leave the hash ready to be read again but fails if it receives the hash after it was read incompletely by some other code.

ESC[78;89;13p ESC[110;121;13p

Replies are listed 'Best First'.
Re: Optimization, side-effects and wrong code
by dragonchild (Archbishop) on Sep 29, 2004 at 18:20 UTC
    I KNEW I'd read something about this before. p383 in Programming Perl, 3rd ed.

    By calling keys in a scalar context, we reset its internal state to ensure that the next each used in the return statement will get the first key.

    sub find_stuff { my $self = shift; scalar keys %$self; while (my $k = each %$self) { return $k if ($self->{$k}->{blablah}); } return undef; }

    Being right, does not endow the right to be rude; politeness costs nothing.
    Being unknowing, is not the same as being stupid.
    Expressing a contrary opinion, whether to the individual or the group, is more often a sign of deeper thought than of cantankerous belligerence.
    Do not mistake your goals as the only goals; your opinion as the only opinion; your confidence as correctness. Saying you know better is not the same as explaining you know better.

    I shouldn't have to say this, but any code, unless otherwise stated, is untested

      Thanks for the tip. I guess it should be printed in bold in each's manpage.

      ESC[78;89;13p ESC[110;121;13p

        It is noted in the POD for keys:

        As a side effect, calling keys() resets the HASH's internal iterator, see each. (In particular, calling keys() in void context resets the iterator with no other overhead.)

        Examine what is said, not who speaks.
        "Efficiency is intelligent laziness." -David Dunham
        "Think for yourself!" - Abigail
        "Memory, processor, disk in that order on the hardware side. Algorithm, algorithm, algorithm on the code side." - tachyon
Re: Optimization, side-effects and wrong code
by diotalevi (Canon) on Sep 29, 2004 at 18:23 UTC

    You were supposed to use keys in scalar or void context to reset the pointer. For normal hashes this is an extremely quick operation as it is simply fetches a C int for the result and resets the iterator.

Re: Optimization, side-effects and wrong code
by Limbic~Region (Chancellor) on Sep 29, 2004 at 18:30 UTC
    cbraga,
    Have you considered using keys or values in scalar context to reset the iterator? I have heard that this is extremely fast in comparison to list context.

    Cheers - L~R

    Update: I hate that I don't see how the thread has been modified since I started replying when I click preview. While I was trying to track down in the documentation where I had heard this, dragonchild went and posted it. *Shrug* - I guess it confirms I remember correctly ;-)

Re: Optimization, side-effects and wrong code
by Zaxo (Archbishop) on Sep 29, 2004 at 18:09 UTC

    Another solution is to abandon each on account of its side effects and say,

    sub find_stuff { my $self = shift; $self->{$_}->{'blablah'} and return $_ for keys %$self; return; }

    Update: Ok, here's another,

    sub find_stuff { my ($self, $k) = shift; while (defined($k = each %$self)) { last if $self->{$k}->{'blablah'} } 1 while each %$self; $k }
    You can't avoid going through all the keys one way or another.

    After Compline,
    Zaxo

      That was cbraga's original solution and he abandoned it because keys builds the whole list prior to usage. each doesn't do that.

      Being right, does not endow the right to be rude; politeness costs nothing.
      Being unknowing, is not the same as being stupid.
      Expressing a contrary opinion, whether to the individual or the group, is more often a sign of deeper thought than of cantankerous belligerence.
      Do not mistake your goals as the only goals; your opinion as the only opinion; your confidence as correctness. Saying you know better is not the same as explaining you know better.

      I shouldn't have to say this, but any code, unless otherwise stated, is untested

        keys builds the whole list prior to usage

        I thought I remember reading somewhere that keys has an optimization in this case, and the whole list is not actually built up front. I couldn't find any documentation, so I decided to do a few tests. Perhaps my test code is flawed in some way I don't see, but the results seem to agree with my premise.

        Here are my snippets:

        Here's a table with the results:

        Update: I should have mentioned earlier, these are from perl 5.005_03 on FreeBSD 4.9-STABLE. I got the value from the VSZ column in ps aux output.

        The change from snippet a to b is much larger than the change from a to c. This seems consistent with the optimization I thought should occur. There may be overhead in the array @keys, but I wouldn't think it would be so much more than building an equivalent list. Maybe someone can explain this another way?

Re: Optimization, side-effects and wrong code
by dragonchild (Archbishop) on Sep 29, 2004 at 18:12 UTC
    <nitpick>
    my $x = { 1 .. 40000 }; for (1 .. 3 ) { my $k; while ($k = each %$x) { last; } %$x = %$x; print "$k\n"; }
    The assignment back to itself will reset the pointer. But, that doesn't help you very much.

    </nitpick>

    Being right, does not endow the right to be rude; politeness costs nothing.
    Being unknowing, is not the same as being stupid.
    Expressing a contrary opinion, whether to the individual or the group, is more often a sign of deeper thought than of cantankerous belligerence.
    Do not mistake your goals as the only goals; your opinion as the only opinion; your confidence as correctness. Saying you know better is not the same as explaining you know better.

    I shouldn't have to say this, but any code, unless otherwise stated, is untested

Re: Optimization, side-effects and wrong code
by eric256 (Parson) on Sep 29, 2004 at 22:31 UTC

    As an alternate solution you could make a linked list out of your hash. Each element has a _next => $next link. Then you can have multiple traversals and none interfere with each other. I know that may not be a usable solution for the current project but in the future if you need to be able to search threw the large list multiple times, you could build the linked list out a regular hash, then traverse it at will and reset your 'pointers' to the beginning when needed. Of course if this is a set that you are doing adds ane removes from then you have to do all the associated work with cleaning up the list. There is quite possibly a modules that ties a hash to a linked list to give you hash lookups and fast searchs. The reseting of each still has global consequences if you are using each somewhere else on the same hash.


    ___________
    Eric Hodges
Re: Optimization, side-effects and wrong code
by bpphillips (Friar) on Sep 30, 2004 at 12:56 UTC
    you could change your while loop to do:
    while(my $k = each %$self || each %$self){ return $k if(exists $self->{$k}->{blahblah}); }
    and avoid the keys %$self thing altogether. In my 5 minutes of playing with it, it appears to work
    Update:
    benizi pointed out the loop becomes endless if $self->{$k}->{blahblah} doesn't return a true value. This is solved by changing it to use exists

      Your code creates an infinite loop, though, if $self->{$k}->{blahblah} never evaluates to true. (whereas the OP's falls through to "return undef;")

      perl -Mstrict -lw my %self; $self{$_}{blahblah} = 0 for qw/1 2 3/; while (my $k = each %self || each %self) { print "testing:$k"; print "true for $k" and last if $self{$k}{blahblah}; } __OUTPUT__ testing:1 testing:3 testing:2 testing:1 testing:3 testing:2 testing:1 ... etc. ...

        agreed, I've added exists which causes it to work correctly (albeit a bit different test than the OP's). Could maybe even add a check to make sure our while loop doesn't run past scalar(keys %$self) but then we're back to using keys again... :-)

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlmeditation [id://395074]
Approved by Zaxo
Front-paged by grinder
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others imbibing at the Monastery: (4)
As of 2024-04-23 22:21 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found