Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask

comment on

( #3333=superdoc: print w/replies, xml ) Need Help??


The purpose of this tutorial is to give a general overview of what an iterator is, why they are useful, how to build one, and things to consider to avoid common pitfalls. It is intended to give the reader enough information to begin using iterators though a certain level of understanding is assumed. The See Also section should be researched if supplemental information is needed.

What Is An Iterator

Iterators come in many forms and you have probably used one without even knowing it. The readline and glob functions as well as the flip-flop operator are all iterators when used in scalar context. User defined iterators are usually in the form of a code reference that when executed will calculate the next item in a list and return it. In the event the list is exhausted, an agreed upon return value is given. While implementations may vary, a subroutine that creates a closure around any necessary state variables and returns the code reference is common. This technique is called a factory and facilitates code reuse.

Why Are Iterators Useful

The most straightforward way to use a list is to define an algorithm to generate the list and store the results in an array. There are several reasons why you might want to consider an iterator instead:
  • The list in its entirety would use too much memory
  • Iterators have tiny memory footprints because only the state information necessary to calculate the next item is stored.
  • The list is infinite
  • Iterators return after each iteration allowing the traversal of an infinite list to stop at any point.
  • The list should be circular
  • Iterators contain state information as well as logic allowing a list of weekdays to wrap-around.
  • The list is large but only a few items may be needed
  • Iterators allow you to stop at any time avoiding the need to calculate any more items then necessary
  • The list needs to be duplicated multiple times
  • Iterators are light-weight and have their own copy of state variables.

How To Build An Iterator

The basic structure of an iterator factory looks like:
sub gen_iterator { my @initial_info = @_; my ($current_state, $done); return sub { # code to calculate $next_state or $done; return undef if $done; return $current_state = $next_state; }; }
To make the factory more flexible, arguments may be passed to the factory to decide how to create the iterator. All state variables that are needed are declared and possibly initialized. A code reference in the same scope as the state variables is returned to the caller completing the transaction. Each time the code reference is executed, the state variables are updated and the next item is returned until the list is exhausted.

The basic usage of an iterator looks like:

my $next = gen_iterator( 42 ); while ( my $item = $next->() ) { print "$item\n"; }

Example: The list in its entirety would use too much memory

You work in genetics and you need every possible sequence of DNA strands in length of 1 to 14. Even if there were no memory overhead using arrays, it would still take nearly 5 gigabytes of memory to accommodate the full list. Iterators to the rescue:
my @DNA = qw/A C T G/; my $seq = gen_permutate(14, @DNA); while ( my $strand = $seq->() ) { print "$strand\n"; } sub gen_permutate { my ($max, @list) = @_; my @curr; return sub { if ( (join '', map { $list[ $_ ] } @curr) eq $list[ -1 ] x @cu +rr ) { @curr = (0) x (@curr + 1); } else { my $pos = @curr; while ( --$pos > -1 ) { ++$curr[ $pos ], last if $curr[ $pos ] < $#list; $curr[ $pos ] = 0; } } return undef if @curr > $max; return join '', map { $list[ $_ ] } @curr; }; }

Example: The list is infinite

You need to assign IDs to all current and future employees and ensure that it is possible to determine if an ID is valid with nothing more than the number itself. You have already taken care of persistency and how to validate the number using the LUHN formula. Iterators take care of the rest.
my $start = $ARGV[0] || 999999; my $next_id = gen_id( $start ); print $next_id->(), "\n" for 1 .. 10; # Next 10 IDs sub gen_id { my $curr = shift; return sub { 0 while ! is_valid( ++$curr ); return $curr; }; } sub is_valid { my ($num, $chk) = (shift, ''); my $tot; for ( 0 .. length($num) - 1 ) { my $dig = substr($num, $_, 1); $_ % 2 ? ($chk .= $dig * 2) : ($tot += $dig); } $tot += $_ for split //, $chk; return $tot % 10 == 0 ? 1 : 0; }

Example: The list should be circular

You need to support legacy apps with hardcoded filenames but want to keep 3 days worth of logs before overwriting. You have everything you need except a way to keep track of which file to write to.
my $next_file = rotate( qw/FileA FileB FileC/ ); print $next_file->(), "\n" for 1 .. 10; sub rotate { my @list = @_; my $index = -1; return sub { $index++; $index = 0 if $index > $#list; return $list[ $index ]; }; }
Adding 1 state variable and an additional check would provide the ability to loop a user defined number of times.

Example: The list is large but only a few items may be needed

You have forgotten the password to your DSL modem and the vendor charges more than the cost of a replacement to unlock it. Fortunately, what you do remember is that it was only 4 characters and all lower case:
while ( my $pass = $next_pw->() ) { if ( unlock( $pass ) ) { print "$pass\n"; last; } } sub fix_size_perm { my ($size, @list) = @_; my @curr = (0) x ($size - 1); push @curr, -1; return sub { if ( (join '', map { $list[ $_ ] } @curr) eq $list[ -1 ] x @cu +rr ) { @curr = (0) x (@curr + 1); } else { my $pos = @curr; while ( --$pos > -1 ) { ++$curr[ $pos ], last if $curr[ $pos ] < $#list; $curr[ $pos ] = 0; } } return undef if @curr > $size; return join '', map { $list[ $_ ] } @curr; }; } sub unlock { $_[0] eq 'john' }

Example: The list needs to be duplicated multiple times

Left as an exercise for the reader.

Things To Consider

  • The iterator's @_ is different than the factory's
  • You should copy @_ to lexicals in the factory as using $_[0] in the iterator may lead to unintentional results.
  • The return value indicating exhaustion is important
  • After choosing a value that is known not to be valid in the list, document it. Allow the user to define their own if it is not possible to know in advance.
  • References to external variables for state may cause problems
  • When making copies is counter-productive, document assumptions.
  • You may need to handle edge cases
  • The first and last items typically involve more logic then the rest. Be sure to verify that your logic is sound by testing edge cases.
  • State variables persist as long as the iterator
  • Reaching the end of the list does not mean that the iterator and state variables are garbage collected. Documentation gives users the choice to destroy the iterator itself once finished, but you may also want to undef the state variables at exhaustion if the memory footprint is large.

See Also

Cheers - L~R

The tutorial has been updated to reflect suggestions from replies as well as the chatterbox. While the examples have been changed as well, I have retained the original code snippets in HTML comments if a reply referenced them.

In reply to How To: Make An Iterator by Limbic~Region

Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":

  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?

    What's my password?
    Create A New User
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others avoiding work at the Monastery: (7)
    As of 2020-09-23 12:44 GMT
    Find Nodes?
      Voting Booth?
      If at first I don’t succeed, I …

      Results (131 votes). Check out past polls.