Recursively-generated Iterators

Recursive algorithms are often simple and intuitive. Unfortunately, they are also often explosive in terms of memory and execution time required. Take, for example, the N-choose-M algorithm:

# Given a list of M items and a number N,
# generate all size-N subsets of M
sub choose_n {
    my $n = pop;
    # Base cases
    return []   if $n == 0  or $n > @_;
    return [@_] if $n == @_;
    # otherwise..
    my ($first, @rest) = @_;
    # combine $first with all N-1 combinations of @rest,
    # and generate all N-sized combinations of @rest
    my @include_combos = choose_n(@rest, $n-1);
    my @exclude_combos = choose_n(@rest, $n);
    return ( (map {[$first, @$_]} @include_combos)
         , @exclude_combos );
}
[download]

Great, as long as you don't want to generate all 10-element subsets of a 20-item list. Or 45-choose-20. In those cases, you will need an iterator. Unfortunately, iteration algorithms are generally completely unlike the recursive ones they mimic. They tend to be a lot trickier.

But they don't have to be. You can often write iterators that look like their recursive counterparts — they even include recursive calls — but they don't suffer from explosive growth. That is, they'll still take a long time to get through a billion combinations, but they'll start returning them to you right away, and they won't eat up all your memory.

The trick is to create iterators to use in place of your recursive calls, then do a little just-in-time placement of those iterator creations.

So let's take a first stab at choose_n. First, our base cases are going to be subs that return whatever they were returning before, but after returning those values once, they don't return anything anymore:

sub iter_choose_n {
    my $n = pop;
    # Base cases
    my $once = 0;
    return sub {$once++ ? () : []} if $n == 0  or $n > @_;
    my ($first, @rest) = @_;
    return sub {$once++ ? () : [$first, @rest]} if $n == @_;
[download]

Apart from the iterator trappings, we've got essentially what we had before. Converting the map into an iterator involves some similar work, but the parallels are still pretty obvious. We exhaust the first iterator before turning to the second:

    # otherwise..
    my $include_iter = iter_choose_n(@rest, $n-1);
    my $exclude_iter = iter_choose_n(@rest, $n);
    return sub {
        if (my $set = $include_iter->()) {
            return [$first, @$set];
        }
        else {
            return $exclude_iter->();
        }
    }
[download]

We now have a recursively-defined iterator that wasn't a heck of a lot more complex than our original algorithm. That's the good news. The bad news is: it's still doubly recursive, O(2^N) in space and time, and so will take a long time to start generating data. Time for a little trick. Because we don't use $exclude_iter until we've exhausted $include_iter, we can delay defining it:

    # otherwise..
    my $include_iter = iter_choose_n(@rest, $n-1);
    my $exclude_iter;
    return sub {
        if (my $set = $include_iter->()) {
            return [$first, @$set];
        }
        else {
            $exclude_iter ||= iter_choose_n(@rest, $n);
            return $exclude_iter->();
        }
    }
}
[download]

Now our code is singly recursive, O(N) in space and time to generate an iterator, and that makes a big difference. Big enough that you probably won't need to go to the trouble of coming up with an O(1) truly iterative solution.

Of course, if you complete the iterations, eventually you will have generated those 2^N subs, and they'll clog up your memory. You may not be concerned about that (you may not be expecting to perform all that many iterations), but if you are, you can put a little code in to free up exhausted iterators:

    # otherwise..
    my $include_iter = iter_choose_n(@rest, $n-1);
    my $exclude_iter;
    return sub {
        if ($include_iter and my $set = $include_iter->()) {
            return [$first, @$set];
        }
        else {
            if ($include_iter) {
                undef $include_iter;
                $exclude_iter = iter_choose_n(@rest, $n);
            }
            return $exclude_iter->();
        }
    }
}
[download]

Update: See my response to Kelan for a recipe for conversion from recursion to iterator.

Caution: Contents may have been coded under pressure.

Comment on Recursively-generated Iterators Select or Download Code

Replies are listed 'Best First'.
Re: Recursively-generated Iterators by blokhead (Monsignor) on May 19, 2005 at 13:09 UTC
Quite an interesting technique, and as a disclaimer I really do like it. That being said, this technique will not always get you the fastest iterator. The holy grail of iterators for combinatorial structures is a constant amortized time (CAT) algorithm, meaning that any N successive calls to the iterator takes O(N) time. This is as good as it gets. Your technique is great as the iterator algorithm mirrors the easy-to-understand recursive algorithm. And kudos for noting (and fixing) the double-recursion. However, your algorithm's running time for iterating through all (N choose K) combinations is proportional to (N+2 choose K+1), so it is not CAT. For a detailed analysis of this and many other interesting iteration algorithms, see section 4.3 of the book Combinatorial Generation by Frank Ruskey. The book does give a CAT iterator for generating combinations, and I believe your general technique of mirroring the recursive algorithm inherently cannot achieve CAT running time (unless you memoize, which defeats the whole purpose of iterating to save memory). I guess it's a matter of what you want to trade -- (sometimes small) factors of running time, or readability of the algorithm. Although for iteration problems like this, I usually just refer to Ruskey's book and find a pre-packaged CAT algorithm ;) blokhead	[reply]
Re^2: Recursively-generated Iterators by Roy Johnson (Monsignor) on May 19, 2005 at 17:37 UTC
You can go a little further and say that this technique will never get you the fastest iterator, but it will likely get you one that is fast and memory-efficient enough, while remaining simple: a sweet spot. Caution: Contents may have been coded under pressure.	[reply]
Re: Recursively-generated Iterators by spurperl (Priest) on May 19, 2005 at 05:18 UTC
Nice ++ I wonder how general your technique is - I mean is it applicable only the to computation you present, or is it generally good for all memory-eating recursive computations ? Can one formulate a general algorithm for this transformation ? By the way, another technique to ease these computations is Memoization. It is almost always very helpful for recursive algorithms like the one you present.	[reply]
Re^2: Recursively-generated Iterators by Roy Johnson (Monsignor) on May 19, 2005 at 17:27 UTC
It should be applicable to any algorithm that generates a set where different portions of the set are generated by different recursive calls. That's mostly combinatorial things like this, but Towers of Hanoi would also be a good example. If each item in the returned set required some value from more than one recursive call, you couldn't do it this way. Caution: Contents may have been coded under pressure.	[reply]
Re: Recursively-generated Iterators by kelan (Deacon) on May 19, 2005 at 21:01 UTC
Here's another iterator solution that avoids the recursion altogether. There are three differences: It will return undef instead of an empty array ref if the number of items is less than the amount to choose (ie 3 choose 5). The amount to choose is passed before the list of items instead of afterwards because I thought it was weird, although your way does make sense if one thinks of it like "Here are 50 things, choose 5". The output order is a bit different. All three of these can certainly be fixed, but I wanted to keep the code clear. `sub choose_n_iter { my @todo = [ shift, [ @_ ], [] ]; return sub { while ( @todo ) { my ( $n, $pool, $tally ) = @{ shift @todo }; return $tally if $n == 0; next if @$pool == 0; my ( $first, @rest ) = @$pool; push @todo, [ $n - 1, \@rest, [ @$tally, $first ]], [ $n , \@rest, $tally ]; } return; }; }` [download] And here is a more effecient version: `sub choose_n_iter { my @todo = [ shift, [ @_ ], [] ]; return sub { while ( @todo ) { my ( $n, $pool, $tally ) = @{ shift @todo }; return $tally if $n == 0; return [ @$tally, @$pool ] if $n == @$pool; next if @$pool == 0; my ( $first, @rest ) = @$pool; push @todo, [ $n - 1, \@rest, [ @$tally, $first ]]; push @todo, [ $n , \@rest, $tally ] if @res +t; } return; }; }` [download] Thanks goes out to HOP for the recursion-to-iterator help.	[reply] [d/l] [select]
Re^2: Recursively-generated Iterators by Roy Johnson (Monsignor) on May 23, 2005 at 15:57 UTC
Your solution is more traditional. It does essentially the same thing as mine — delaying evaluation of later elements until they're needed — by keeping an explicit stack (or what should be a stack, but you seem to be using it as a queue, which explains the different order of return values). You note that it "avoids recursion altogether", but that's not really a legitimate goal. Recursion is not a bad thing in and of itself. It is often a very helpful thing. The reason I think my solution is an improvement is that it saves you the trouble of manipulating a stack, and allows the iterative solution to more closely resemble the recursive one. In a word: simplicity. You can follow a foolproof recipe, and in five or ten minutes have an iterator that returns exactly the same set as the recursive one. Same order, same structure, same arguments. Here's the recipe (this is new since my original post): Empty-set return values are replaced by empty subs Explicit (non-recursing) return values are assigned to an array; the iterator will be a sub that shifts that array A list of return values that includes recursive calls is replaced by a list of sub-wrapped iterators (wrapping the iterator calls in subs delays their evaluation), the iterator will be a sub that shifts its way through that list, unwrapping iterators until it finds one that returns something. See the rewrite of iter_choose_n (at the end of this post) for what to do when you have to accumulate data with a recursive call, as choose_n does (but Hanoi doesn't). That last is a little complicated to describe in words, so I'll demonstrate with Towers of Hanoi. Read more... (2 kB) Rewriting iter_choose_n to work with the boilerplate: Read more... (2 kB) Caution: Contents may have been coded under pressure.	[reply] [d/l] [select]
Re: Recursively-generated Iterators by runrig (Abbot) on May 20, 2005 at 04:39 UTC
I had my own idea of how to do it, and I came up with restartable iterators: Read more... (986 Bytes) In the append iterator, when the second iterator is exhausted, it causes the first iterator to iterate, and then restarts the second iterator at the appropriate starting point. Update:Unlike the other solutions, the solution above returns an array of indices instead of a set of elements from a list, but that's easy to adjust: `sub choose_n { my $n = shift; my @set = @_; my $iter = choose_m_of_n_iter($n, scalar(@set)); sub { @set[$iter->()]; } }` [download] Update: Simplified code. Which may or may not be a good thing :-)	[reply] [d/l] [select]
Re^2: Recursively-generated Iterators by Roy Johnson (Monsignor) on May 23, 2005 at 17:11 UTC
I was not able to figure out what your strategy is. Could you describe what a restartable iterator is, and what problem it addresses? Caution: Contents may have been coded under pressure.	[reply]
Re^3: Recursively-generated Iterators by runrig (Abbot) on May 23, 2005 at 18:13 UTC
Could you describe what a restartable iterator is, and what problem it addresses? Well, I made up the name (and so maybe it's a badly chosen name), but what I mean is, an iterator whose behaviour you can modify after it has already started (in this case, it's a simple iterator, but I can restart it at any point each time it is exhausted). In iterating over the result set, I noticed that in choosing M out of a set of N objects (in a normal nested for loop way), the inner most loop would iterate from M to N, then from M+1 to N, then M+2 to N, etc (down to N to N), while the next outer loop would iterate from M-1 to N-1, then from M to N-1, then M+1 to N-1, etc. The outer most loop just iterates from 1 to N-M+1 (assuming the array index starts at 1, though in above program it starts at zero). That was the basic pattern at first, but it was more complicated eventually, because the first time that the innermost iterator got down to the "N to N" iteration, it would have to restart at M+1 at the next iteration. So I thought it might be easier to construct an iterator that could tell the next "inner" iterator where to start (the ending point is always the same for each iterator in this case). The iterators I've seen so far return a function that accepts no arguments, but in my "restartable_iter()", it returns a function that can accept a new starting point. This, of course, is not a generic solution for all iteration problems, because one iterator must be aware of how to tell the next iterator in line what to do. But it was fun to come up with.	[reply]
Re: Recursively-generated Iterators by Roy Johnson (Monsignor) on Jun 02, 2005 at 01:53 UTC
Just another example of a conversion (using the boilerplate recipe). I'm posting mostly so it will be here to refer back to if I revisit the topic. It's interestingly different from the other examples, and the recipe still works well (although I get a curious `Use of uninitialized value in array element` warning if I don't turn those warnings off (which I do, below)). What I'm generating is permutations, like Algorithm::Loops NextPermute. #!perl use strict; use warnings; sub no_repeat_combos { return [@_] unless @_ > 1; # Find the first occurrence of each unique element my %seen; defined($seen{$_[$_]}) or $seen{$_[$_]} = $_ for 0..$#_; # For each unique element, stick it on the front of each # of the no-repeat-combos of the rest map { my $first_pos = $_; my @rest = @_[ grep {$first_pos != $_} 0..$#_ ]; map [$_[$first_pos], @$_], no_repeat_combos(@rest); } (sort {$a <=> $b} values %seen); } sub nrc_iter { # Base cases get assigned to an array, which the iterator shifts t +hrough my @base_case = ([@_]); return sub{ shift @base_case } unless @_ > 1; # Find the first occurrence of each unique element my %seen; defined($seen{$_[$_]}) or $seen{$_[$_]} = $_ for 0..$#_; # For each unique element, stick it on the front of each # of the no-repeat-combos of the rest my @arg_list = @_; my @sub_iter = map { my $first_pos = $_; my @rest = @_[ grep {$first_pos != $_} 0..$#_ ]; sub { my $recurse = nrc_iter(@rest); my $set; no warnings 'uninitialized'; sub { ($set = $recurse->()) ? [$arg_list[$first_pos], @$set] + : () } } } sort {$a <=> $b} values %seen; # Below here is boilerplate: if you've done the above steps right, + just plug # this in, and it works. It returns the first iterator from the li +st that # returns anything. # Grab and unwrap an iterator from the list my $iter = (shift @sub_iter)->(); return sub { my $rval; $iter = (shift @sub_iter)->() until ($rval = $iter->() or @sub_iter == 0); return $rval; } } for ([1], [1,1], [1,2], [qw(a b a)], [qw(a b b a)]) { print "=== @$_ ===\n"; my $i = nrc_iter(@$_); print " @$_\n" while $_ = $i->(); } [download] Caution: Contents may have been coded under pressure.	[reply] [d/l] [select]


Your skill will accomplish what the force of many cannot
	PerlMonks