How To: Make An Iterator

Purpose:

The purpose of this tutorial is to give a general overview of what an iterator is, why they are useful, how to build one, and things to consider to avoid common pitfalls. It is intended to give the reader enough information to begin using iterators though a certain level of understanding is assumed. The See Also section should be researched if supplemental information is needed.

What Is An Iterator

Iterators come in many forms and you have probably used one without even knowing it. The readline and glob functions as well as the flip-flop operator are all iterators when used in scalar context. User defined iterators are usually in the form of a code reference that when executed will calculate the next item in a list and return it. In the event the list is exhausted, an agreed upon return value is given. While implementations may vary, a subroutine that creates a closure around any necessary state variables and returns the code reference is common. This technique is called a factory and facilitates code reuse.

Why Are Iterators Useful

The most straightforward way to use a list is to define an algorithm to generate the list and store the results in an array. There are several reasons why you might want to consider an iterator instead:

The list in its entirety would use too much memory
The list is infinite
The list should be circular
The list is large but only a few items may be needed
The list needs to be duplicated multiple times

How To Build An Iterator

The basic structure of an iterator factory looks like:

sub gen_iterator {
    my @initial_info = @_;
    my ($current_state, $done);
    return sub {
        # code to calculate $next_state or $done;
        return undef if $done;
        return $current_state = $next_state;   
    };
}
[download]

To make the factory more flexible, arguments may be passed to the factory to decide how to create the iterator. All state variables that are needed are declared and possibly initialized. A code reference in the same scope as the state variables is returned to the caller completing the transaction. Each time the code reference is executed, the state variables are updated and the next item is returned until the list is exhausted.

The basic usage of an iterator looks like:

my $next = gen_iterator( 42 );
while ( my $item = $next->() ) {
    print "$item\n";
}
[download]

Example: The list in its entirety would use too much memory

You work in genetics and you need every possible sequence of DNA strands in length of 1 to 14. Even if there were no memory overhead using arrays, it would still take nearly 5 gigabytes of memory to accommodate the full list. Iterators to the rescue:

my @DNA = qw/A C T G/;
my $seq = gen_permutate(14, @DNA);

while ( my $strand = $seq->() ) {
    print "$strand\n";
}
sub gen_permutate {
    my ($max, @list) = @_;
    my @curr;
    return sub {
        if ( (join '', map { $list[ $_ ] } @curr) eq $list[ -1 ] x @cu
+rr ) {
            @curr = (0) x (@curr + 1);
        }
        else {
            my $pos = @curr;
            while ( --$pos > -1 ) {
                ++$curr[ $pos ], last if $curr[ $pos ] < $#list;
                $curr[ $pos ] = 0;
            }
        }
        return undef if @curr > $max;
        return join '', map { $list[ $_ ] } @curr;
    };
}
[download]

Example: The list is infinite

You need to assign IDs to all current and future employees and ensure that it is possible to determine if an ID is valid with nothing more than the number itself. You have already taken care of persistency and how to validate the number using the LUHN formula. Iterators take care of the rest.

my $start = $ARGV[0] || 999999;
my $next_id = gen_id( $start );
print $next_id->(), "\n" for 1 .. 10;  # Next 10 IDs

sub gen_id {
    my $curr = shift;
    return sub {
        0 while ! is_valid( ++$curr );
        return $curr;
    };
}
sub is_valid {
    my ($num, $chk) = (shift, '');
    my $tot;
    for ( 0 .. length($num) - 1 ) {
        my $dig = substr($num, $_, 1);
        $_ % 2 ? ($chk .= $dig * 2) : ($tot += $dig);
    }
    $tot += $_ for split //, $chk;
    return $tot % 10 == 0 ? 1 : 0;
}
[download]

Example: The list should be circular

You need to support legacy apps with hardcoded filenames but want to keep 3 days worth of logs before overwriting. You have everything you need except a way to keep track of which file to write to.

my $next_file = rotate( qw/FileA FileB FileC/ );
print $next_file->(), "\n" for 1 .. 10;

sub rotate {
    my @list = @_;
    my $index = -1;
    return sub {
        $index++;
        $index = 0 if $index > $#list;
        return $list[ $index ];
    };
}
[download]

Adding 1 state variable and an additional check would provide the ability to loop a user defined number of times.

Example: The list is large but only a few items may be needed

You have forgotten the password to your DSL modem and the vendor charges more than the cost of a replacement to unlock it. Fortunately, what you do remember is that it was only 4 characters and all lower case:

while ( my $pass = $next_pw->() ) {
    if ( unlock( $pass ) ) {
        print "$pass\n";
        last;
    }
}
sub fix_size_perm {
    my ($size, @list) = @_;
    my @curr = (0) x ($size - 1);
    push @curr, -1;
    return sub {
        if ( (join '', map { $list[ $_ ] } @curr) eq $list[ -1 ] x @cu
+rr ) {
            @curr = (0) x (@curr + 1);
        }
        else {
            my $pos = @curr;
            while ( --$pos > -1 ) {
                ++$curr[ $pos ], last if $curr[ $pos ] < $#list;
                $curr[ $pos ] = 0;
            }
        }
        return undef if @curr > $size;
        return join '', map { $list[ $_ ] } @curr;
    };
}
sub unlock { $_[0] eq 'john' }
[download]

Example: The list needs to be duplicated multiple times

Left as an exercise for the reader.

Things To Consider

The iterator's @_ is different than the factory's

$_[0]

The return value indicating exhaustion is important
References to external variables for state may cause problems
You may need to handle edge cases
State variables persist as long as the iterator

Replies are listed 'Best First'.
Re: How To: Make An Iterator by Roy Johnson (Monsignor) on Apr 25, 2005 at 18:36 UTC
A quibble on terms: what you call a "reusable iterator" is a "factory". A reusable iterator would be one that resets itself after returning the "list is exhausted" value. Array iterator factory is an example of a factory that generates reusable iterators. Caution: Contents may have been coded under pressure.	[reply]
Re: How To: Make An Iterator by tlm (Prior) on Apr 25, 2005 at 23:20 UTC
Maybe this is too obvious to be worth mentioning, but under "Why Are Iterators Useful" I would add that they offer a uniform interface for traversal. The underlying data structure may change (and with it the traversal algorithm), but the iterator interface remains constant. A corollary benefit is that iterators reify the process of traversal, so that one can begin to think of functions that take among their arguments both a data structure and an iterator. For example, one could implement a generalization of `map` that took three arguments: a data structure, an iterator for the data structure, and some function to be applied to each element of the data structure; and returned the list resulting from applying the function to each element of the data structure in the order prescribed by the iterator. Update: Added second paragraph. the lowliest monk	[reply]
Re: How To: Make An Iterator by Anonymous Monk on Apr 25, 2005 at 18:01 UTC
Those interested in iterators might also find Infinitely Lazy and Hamming Sequences and Lazy Lists useful	[reply]
Re^2: How To: Make An Iterator by Limbic~Region (Chancellor) on Apr 25, 2005 at 18:16 UTC
Anonymous Monk, Also see: Challenge: Another Infinite Lazy List Generating powerset with progressive ordering Re^2: Finding all Combinations Re: a close prime number I used trivial examples in the tutorial because explanations of difficult specific implementations would have detracted from the general concept. Cheers - L~R	[reply]
Re: How To: Make An Iterator by kelan (Deacon) on Apr 25, 2005 at 21:13 UTC
Here's another take on the first example that only returns one line at a time until all filehandles are exhausted and returns undef when that happens. `sub gen_fh_iterator { my ( $current, @fhs ) = @_; return sub { my $line; while ( ! defined $line && @fhs ) { $line = <$current>; $current = shift @fhs if not defined $line; } return $line; }; } my $file_iter = gen_fh_iterator( @fhs ); while ( my $line = $file_iter->() ) { print "$line\n"; }` [download]	[reply] [d/l]
Re: How To: Make An Iterator by eric256 (Parson) on Apr 25, 2005 at 22:35 UTC
Hardly original but my own fibonacci number generator. `use strict; use warnings; $\| = 1; sub infinite_fib { my ($last,$curr) = @_; return sub { my $temp = $last + $curr; $last = $curr; $curr = $temp; return $curr; }; } my $next_fib = infinite_fib( 1,0 ); print $next_fib->(), "\n" for(1..10);` [download] Kinda fun just because it reaches infinity in a pretty big hurry if you let it run. ___________ Eric Hodges	[reply] [d/l]
Re: How To: Make An Iterator by ihb (Deacon) on Apr 26, 2005 at 14:17 UTC
`&infinite_evens` is unnecessary complicated. `sub infinite_evens { my ($i) = @_; $i += $i % 2 - 2; return sub { $i += 2 }; }` [download] `ihb` See perltoc if you don't know which perldoc to read!	[reply] [d/l] [select]
Re: How To: Make An Iterator by Anonymous Monk on Apr 17, 2006 at 19:24 UTC
Under Things To Consider: * The return value indicating exhaustion is important After choosing a value that is known not to be valid in the list, document it. Allow the user to define their own if it is not possible to know in advance. An even better approach is to use the iterator in list context and just return an empty list when the list is exhausted. `while (my ($elem) = $iter->()) { ... }` [download] That way you don't need any assumptions, you don't need to make any limitations to the value space of the list, and it'll be easier for the maintainer that doesn't have to remember which list doesn't allow which element and which list uses what to signal list exhaustion.	[reply] [d/l]
Re^2: How To: Make An Iterator by Limbic~Region (Chancellor) on Apr 17, 2006 at 22:42 UTC
Anonymous Monk, No, that isn't a better approach. If your iterator is designed to return a list each iteration and an empty list is valid then you will prematurely terminate. I stand by my statements though I will concede that it would be fair to say that an empty list is usually safe. Cheers - L~R	[reply]
Re^3: How To: Make An Iterator by Anonymous Monk on Apr 20, 2006 at 00:09 UTC
If your iterator is designed to return a list each iteration and an empty list is valid then you will prematurely terminate. Uhm, the whole idea with the above is that if you want "a list" from the iterator the iterator returns an array reference, just as you do if you want to return two or more "lists" from a subroutine or have several "lists" in a list or array. Why on earth would the iterator want to return a list instead of an array reference? That would just make it uncomfortable to use. If you return array references you have a perfectly general, maintainable and safe way to signal exhaustion.	[reply]
Re^4: How To: Make An Iterator by Limbic~Region (Chancellor) on Apr 20, 2006 at 12:24 UTC

Back to Tutorials