Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

Any thoughs on Iterator prefetching?

by mzedeler (Pilgrim)
on Jun 25, 2009 at 13:11 UTC ( [id://774712]=perlquestion: print w/replies, xml ) Need Help??

mzedeler has asked for the wisdom of the Perl Monks concerning the following question:

Hi fellow perl monks.

I have had a nasty surprise when I discovered that Iterator prefetches values when instantiated. This causes non-pure functions to behave very, very strange. E.g.:

use Iterator; my $i = 0; my $iterator = new Iterator(sub { return ++$i }); print "Before calling anything: $i\n"; print "First call yields: ", $iterator->value, "\n"; print "After first call: $i\n";

This SHOULD print:

Before calling anything: 0 First call yields: 1 After first call: 1

But it does in fact print this:

Before calling anything: 1 First call yields: 1 After first call: 2

It seems that the author of Iterator wants it to contain a buffer of one value, but I have a very hard time understanding why. Could someone please enlighten me?

Here is an even more degenerate example using chained iterators:

use Iterator; my $i = 0; sub get_a { return new Iterator(sub { print "a called\n"; return $i++ }); } sub get_b { my $a = get_a; return new Iterator(sub { print "b called\n"; return $a->value }); } print "Now I will get_b ($i):\n"; $b = get_b; print "Got b ($i)\n"; print "First value is ($i):\n"; print "b: ", $b->value, " ($i)\n";

This SHOULD print:

Now I will get_b (0): Got b (0) First value is (0): b called a called b: 0 (1)

But it prints this:

Now I will get_b (0): a called b called a called Got b (2) First value is (2): b called a called b: 0 (3)

The strange behavior can be explained off when you consider the prefetch buffer, but it is very hard to use in practise.

RT for the module contains two reports regarding this, (I posted one of them):

http://rt.cpan.org/Public/Dist/Display.html?Name=Iterator

Please let me know if there is any reasonable explaination for this behaviour.

Regards,

Michael Zedeler (MADZ).

Replies are listed 'Best First'.
Re: Any thoughs on Iterator prefetching?
by Transient (Hermit) on Jun 25, 2009 at 14:07 UTC
    Looking through the code, the author states:

    From _initialize (called from new):
    # Caches the first value of the iterator in %next_value +_for.
    and value:
    # Notes: Keeps one forward-looking value for the iterator +in # %next_value_for. This is so we have something + to # return when user's code throws Am_Now_Exhauste +d.
    Assuming this is to know when the iterator will be exhausted, as pKai said.

    For your second example, this is explicitly stated in the docs:
    When you use an iterator in separate parts of your program, or as an a +rgument to the various iterator functions, you do not get a copy of t +he iterator's stream of values. In other words, if you grab a value from an iterator, then some other +part of the program grabs a value from the same iterator, you will be + getting different values. This can be confusing if you're not expecting it. For example: my $it_one = Iterator->new ({something}); my $it_two = some_iterator_transformation $it_one; my $value = $it_two->value(); my $whoops = $it_one->value; Here, some_iterator_transformation takes an iterator as an argument, a +nd returns an iterator as a result. When a value is fetched from $it_ +two, it internally grabs a value from $it_one (and presumably transfo +rms it somehow). If you then grab a value from $it_one, you'll get it +s second value (or third, or whatever, depending on how many values $ +it_two grabbed), not the first.
    IMO, if you needed to use the value returned from the iterator, save it somewhere instead of using a variable in the same scope - although I realize that your code is more than likely a simplification of what you're really trying to do

      Thanks for pointing out that the design is intentional. The example from the documentation doesn't address the buffering issue, I am concerned with. A bufferless implementation would behave the same way. Try running this minimal test:

      use Test::More tests => 1; use Iterator; my $value = 0; my $iterator = new Iterator(sub { return ++$value }); is($value, 0, "Haven't called the iterator yet.");

      The reason that I have raised this question is because I'd like to discuss the pros and cons of this design. What exactly is it that you get out of having a buffering iterator?

      (I know that I have submitted a bug on this behaviour which was before I realized that the behaviour was intentional.)

      If the price of being able to tell whether the iterator has been exhausted or not is buffering, then the price in terms of problems dealing with this buffering is way too high.

      Another consideration is whether there is enough reason to implement iterators in a separate class, since perl offers closures that can be used to do almost everything Iterator has to offer.

      All in all I want to discuss pros and cons of various iterator designs.

Re: Any thoughs on Iterator prefetching?
by pKai (Priest) on Jun 25, 2009 at 13:42 UTC

    A wise man once said: "Prediction is hard, especially about the future."

    A look into the source of Iterator shows, that it is implemented as you described (one value look ahead internally).

    Doing it this way seems questionable.


    Update: It seems that he needs the look ahead to be able to return undef as a valid value telling it apart from Iterator-exhausted.

    So maybe you are better of with Iterator::Simple which cannot yield undef, but will also not look ahead.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://774712]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (2)
As of 2024-04-19 19:40 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found