http://qs321.pair.com?node_id=451278

Purpose:

The purpose of this tutorial is to give a general overview of what an iterator is, why they are useful, how to build one, and things to consider to avoid common pitfalls. It is intended to give the reader enough information to begin using iterators though a certain level of understanding is assumed. The See Also section should be researched if supplemental information is needed.

What Is An Iterator

Iterators come in many forms and you have probably used one without even knowing it. The readline and glob functions as well as the flip-flop operator are all iterators when used in scalar context. User defined iterators are usually in the form of a code reference that when executed will calculate the next item in a list and return it. In the event the list is exhausted, an agreed upon return value is given. While implementations may vary, a subroutine that creates a closure around any necessary state variables and returns the code reference is common. This technique is called a factory and facilitates code reuse.

Why Are Iterators Useful

The most straightforward way to use a list is to define an algorithm to generate the list and store the results in an array. There are several reasons why you might want to consider an iterator instead:
  • The list in its entirety would use too much memory
  • Iterators have tiny memory footprints because only the state information necessary to calculate the next item is stored.
  • The list is infinite
  • Iterators return after each iteration allowing the traversal of an infinite list to stop at any point.
  • The list should be circular
  • Iterators contain state information as well as logic allowing a list of weekdays to wrap-around.
  • The list is large but only a few items may be needed
  • Iterators allow you to stop at any time avoiding the need to calculate any more items then necessary
  • The list needs to be duplicated multiple times
  • Iterators are light-weight and have their own copy of state variables.

How To Build An Iterator

The basic structure of an iterator factory looks like:
sub gen_iterator { my @initial_info = @_; my ($current_state, $done); return sub { # code to calculate $next_state or $done; return undef if $done; return $current_state = $next_state; }; }
To make the factory more flexible, arguments may be passed to the factory to decide how to create the iterator. All state variables that are needed are declared and possibly initialized. A code reference in the same scope as the state variables is returned to the caller completing the transaction. Each time the code reference is executed, the state variables are updated and the next item is returned until the list is exhausted.

The basic usage of an iterator looks like:

my $next = gen_iterator( 42 ); while ( my $item = $next->() ) { print "$item\n"; }

Example: The list in its entirety would use too much memory

You work in genetics and you need every possible sequence of DNA strands in length of 1 to 14. Even if there were no memory overhead using arrays, it would still take nearly 5 gigabytes of memory to accommodate the full list. Iterators to the rescue:
my @DNA = qw/A C T G/; my $seq = gen_permutate(14, @DNA); while ( my $strand = $seq->() ) { print "$strand\n"; } sub gen_permutate { my ($max, @list) = @_; my @curr; return sub { if ( (join '', map { $list[ $_ ] } @curr) eq $list[ -1 ] x @cu +rr ) { @curr = (0) x (@curr + 1); } else { my $pos = @curr; while ( --$pos > -1 ) { ++$curr[ $pos ], last if $curr[ $pos ] < $#list; $curr[ $pos ] = 0; } } return undef if @curr > $max; return join '', map { $list[ $_ ] } @curr; }; }

Example: The list is infinite

You need to assign IDs to all current and future employees and ensure that it is possible to determine if an ID is valid with nothing more than the number itself. You have already taken care of persistency and how to validate the number using the LUHN formula. Iterators take care of the rest.
my $start = $ARGV[0] || 999999; my $next_id = gen_id( $start ); print $next_id->(), "\n" for 1 .. 10; # Next 10 IDs sub gen_id { my $curr = shift; return sub { 0 while ! is_valid( ++$curr ); return $curr; }; } sub is_valid { my ($num, $chk) = (shift, ''); my $tot; for ( 0 .. length($num) - 1 ) { my $dig = substr($num, $_, 1); $_ % 2 ? ($chk .= $dig * 2) : ($tot += $dig); } $tot += $_ for split //, $chk; return $tot % 10 == 0 ? 1 : 0; }

Example: The list should be circular

You need to support legacy apps with hardcoded filenames but want to keep 3 days worth of logs before overwriting. You have everything you need except a way to keep track of which file to write to.
my $next_file = rotate( qw/FileA FileB FileC/ ); print $next_file->(), "\n" for 1 .. 10; sub rotate { my @list = @_; my $index = -1; return sub { $index++; $index = 0 if $index > $#list; return $list[ $index ]; }; }
Adding 1 state variable and an additional check would provide the ability to loop a user defined number of times.

Example: The list is large but only a few items may be needed

You have forgotten the password to your DSL modem and the vendor charges more than the cost of a replacement to unlock it. Fortunately, what you do remember is that it was only 4 characters and all lower case:
while ( my $pass = $next_pw->() ) { if ( unlock( $pass ) ) { print "$pass\n"; last; } } sub fix_size_perm { my ($size, @list) = @_; my @curr = (0) x ($size - 1); push @curr, -1; return sub { if ( (join '', map { $list[ $_ ] } @curr) eq $list[ -1 ] x @cu +rr ) { @curr = (0) x (@curr + 1); } else { my $pos = @curr; while ( --$pos > -1 ) { ++$curr[ $pos ], last if $curr[ $pos ] < $#list; $curr[ $pos ] = 0; } } return undef if @curr > $size; return join '', map { $list[ $_ ] } @curr; }; } sub unlock { $_[0] eq 'john' }

Example: The list needs to be duplicated multiple times

Left as an exercise for the reader.

Things To Consider

  • The iterator's @_ is different than the factory's
  • You should copy @_ to lexicals in the factory as using $_[0] in the iterator may lead to unintentional results.
  • The return value indicating exhaustion is important
  • After choosing a value that is known not to be valid in the list, document it. Allow the user to define their own if it is not possible to know in advance.
  • References to external variables for state may cause problems
  • When making copies is counter-productive, document assumptions.
  • You may need to handle edge cases
  • The first and last items typically involve more logic then the rest. Be sure to verify that your logic is sound by testing edge cases.
  • State variables persist as long as the iterator
  • Reaching the end of the list does not mean that the iterator and state variables are garbage collected. Documentation gives users the choice to destroy the iterator itself once finished, but you may also want to undef the state variables at exhaustion if the memory footprint is large.

See Also

Cheers - L~R

The tutorial has been updated to reflect suggestions from replies as well as the chatterbox. While the examples have been changed as well, I have retained the original code snippets in HTML comments if a reply referenced them.

Replies are listed 'Best First'.
Re: How To: Make An Iterator
by Roy Johnson (Monsignor) on Apr 25, 2005 at 18:36 UTC
    A quibble on terms: what you call a "reusable iterator" is a "factory". A reusable iterator would be one that resets itself after returning the "list is exhausted" value. Array iterator factory is an example of a factory that generates reusable iterators.

    Caution: Contents may have been coded under pressure.
Re: How To: Make An Iterator
by tlm (Prior) on Apr 25, 2005 at 23:20 UTC

    Maybe this is too obvious to be worth mentioning, but under "Why Are Iterators Useful" I would add that they offer a uniform interface for traversal. The underlying data structure may change (and with it the traversal algorithm), but the iterator interface remains constant.

    A corollary benefit is that iterators reify the process of traversal, so that one can begin to think of functions that take among their arguments both a data structure and an iterator. For example, one could implement a generalization of map that took three arguments: a data structure, an iterator for the data structure, and some function to be applied to each element of the data structure; and returned the list resulting from applying the function to each element of the data structure in the order prescribed by the iterator.

    Update: Added second paragraph.

    the lowliest monk

Re: How To: Make An Iterator
by Anonymous Monk on Apr 25, 2005 at 18:01 UTC
Re: How To: Make An Iterator
by kelan (Deacon) on Apr 25, 2005 at 21:13 UTC

    Here's another take on the first example that only returns one line at a time until all filehandles are exhausted and returns undef when that happens.

    sub gen_fh_iterator { my ( $current, @fhs ) = @_; return sub { my $line; while ( ! defined $line && @fhs ) { $line = <$current>; $current = shift @fhs if not defined $line; } return $line; }; } my $file_iter = gen_fh_iterator( @fhs ); while ( my $line = $file_iter->() ) { print "$line\n"; }

Re: How To: Make An Iterator
by eric256 (Parson) on Apr 25, 2005 at 22:35 UTC

    Hardly original but my own fibonacci number generator.

    use strict; use warnings; $| = 1; sub infinite_fib { my ($last,$curr) = @_; return sub { my $temp = $last + $curr; $last = $curr; $curr = $temp; return $curr; }; } my $next_fib = infinite_fib( 1,0 ); print $next_fib->(), "\n" for(1..10);

    Kinda fun just because it reaches infinity in a pretty big hurry if you let it run.


    ___________
    Eric Hodges
Re: How To: Make An Iterator
by ihb (Deacon) on Apr 26, 2005 at 14:17 UTC

    &infinite_evens is unnecessary complicated.

    sub infinite_evens { my ($i) = @_; $i += $i % 2 - 2; return sub { $i += 2 }; }

    ihb

    See perltoc if you don't know which perldoc to read!

Re: How To: Make An Iterator
by Anonymous Monk on Apr 17, 2006 at 19:24 UTC
    Under Things To Consider:
    * The return value indicating exhaustion is important

    After choosing a value that is known not to be valid in the list, document it. Allow the user to define their own if it is not possible to know in advance.
    An even better approach is to use the iterator in list context and just return an empty list when the list is exhausted.
    while (my ($elem) = $iter->()) { ... }
    That way you don't need any assumptions, you don't need to make any limitations to the value space of the list, and it'll be easier for the maintainer that doesn't have to remember which list doesn't allow which element and which list uses what to signal list exhaustion.
      Anonymous Monk,
      No, that isn't a better approach. If your iterator is designed to return a list each iteration and an empty list is valid then you will prematurely terminate.

      I stand by my statements though I will concede that it would be fair to say that an empty list is usually safe.

      Cheers - L~R

        If your iterator is designed to return a list each iteration and an empty list is valid then you will prematurely terminate.
        Uhm, the whole idea with the above is that if you want "a list" from the iterator the iterator returns an array reference, just as you do if you want to return two or more "lists" from a subroutine or have several "lists" in a list or array.

        Why on earth would the iterator want to return a list instead of an array reference? That would just make it uncomfortable to use. If you return array references you have a perfectly general, maintainable and safe way to signal exhaustion.