A mini-language for sequences (part 1)

Working with collections of things is one of the most common chores in programming. In this meditation, we'll create a functional mini-language for sequences – more than iterators and yet not quite streams – and use it as a simple, abstract tool for working with collections. To gain familiarity with our tool, we'll answer some recent questions that were asked by Seekers of Perl Wisdom.

Since we haven't defined what a sequence is yet, let's do that now:

A sequence is a function that represents a finite series of values. When called, the function returns the next value in the series. The first call to the function after the series has been exhausted returns an empty array and resets the sequence back to its beginning.

A couple of points. First, sequences are cyclical. After you exhaust them, they start over again. Second, the values extracted from a sequence are arrays, not scalars. This allows sequences to return multiple values at a time. As we'll see later, these properties allow us to combine sequences in interesting ways.

Fundamental sequences

With our definition in mind, let's begin our implementation. Some preliminaries:

    #!/usr/bin/perl

    use warnings;
    use strict;
[download]

Although most of the code we'll write is functional in nature, Perl's object-oriented syntax is often a convenient way to call functions. To make this syntax available, let's set up a package and write a couple of helper functions to promote functions into Sequences:

    package Sequences;

    sub new {
        my ($proto, $seq) = @_;
        bless $seq, $proto;
    }

    sub seqsub(&) {
        Sequences->new(@_);
    }
[download]

The package declaration and the new subroutine are standard Perl OO fare. The seqsub function, however, is somewhat odd. We'll use it as an alternative syntax to Perl's normal sub when we want to to create functions (anonymous subroutines) that represent sequences.

To see how the set-up works, let's create our first sequence-making function, seq. It creates a sequence out of the series formed by its arguments:

    sub seq {
        my ($i, $elems) = (0, \@_);
        seqsub {
            $i < @$elems
                ? ( $elems->[ $i++ ] )
                : do { $i = 0; () };
        }
    }
[download]

To see how it works, let's create simple, 3-element sequence and extract its values:

    my $abcees = seq("a", "b", "c");

    $abcees->();  # ("a")
    $abcees->();  # ("b")
    $abcees->();  # ("c")
    $abcees->();  # (   )
    # ... the cycle repeats ...
[download]

Because we will often want to see what's "in" a sequence, let's create a function to enumerate a sequence's values:

    use Data::Dumper;

    sub enumerate {
        local $Data::Dumper::Terse = 1;
        local $Data::Dumper::Indent = 0;
        my ($i, $seq) = (0, $_[0]);
        while (my @val = $seq->()) {
            @val = map { ref ($_) ? Dumper($_) : $_ } @val;
            printf "%2d => %s\n", $i++, "@val";
        }
        $seq;
    }
[download]

The while loop within the function shows us the idiom for iterating over a sequence's values: We extract values until we get an empty array. We print each value we get. When we're done, we return the sequence itself to facilitate function chaining.

As an example, let's enumerate our earlier sequence:

    enumerate( $abcees );

    #  0 => a
    #  1 => b
    #  2 => c
[download]

Let's try the alternative OO syntax:

    $abcees->enumerate;

    #  0 => a
    #  1 => b
    #  2 => c
[download]

Not bad. Now let's move on to something a little more weighty.

Combinations of sequences

One of the things that makes sequences interesting is that we can combine them in ways that let us eliminate a lot of boilerplate code. For example, we often want to do things for each element of one set and for each element of another. We typically use nested foreach loops for this cause, but sequences give us another option.

The following function takes two separate sequences s and t and combines them into a sequence whose values are the Cartesian product of the values drawn from s and t, in effect nesting t inside of s:

    sub seq_prod2 {
        my ($s, $t) = @_;
        my @sval;
        seqsub {
            my @tval;
            while ( !@sval || !(@tval = $t->()) ) {
                return () unless @sval = $s->();
            }
            ( @sval, @tval );
        }
    };
[download]

An example:

    my $one_two_threes = seq( 1 .. 3 );

    enumerate( seq_prod2( $abcees, $one_two_threes ) );

    #  0 => a 1
    #  1 => a 2
    #  2 => a 3
    #  3 => b 1
    #  4 => b 2
    #  5 => b 3
    #  6 => c 1
    #  7 => c 2
    #  8 => c 3
[download]

One of the idioms that appears frequently in functional programming is to take a binary function and generalize it into an n-ary function by "folding" it over a list of arguments. We can use List::Util's reduce function for this purpose. Let's use it generalize seq_prod2 into the n-ary seq_prod:

    use List::Util qw( reduce );

    sub seq_prod {
        reduce { seq_prod2($a,$b) } @_ ;
    }
[download]

Now we can compute the products of any number of sequences:

    my $you_and_mees = seq( "you", "me" );

    seq_prod( $abcees, $one_two_threes, $you_and_mees )
        ->enumerate;

    #  0 => a 1 you
    #  1 => a 1 me
    #  2 => a 2 you
    #  3 => a 2 me
    #  4 => a 3 you
    #  5 => a 3 me
    #  6 => b 1 you
    #  7 => b 1 me
    #  8 => b 2 you
    #  9 => b 2 me
    # 10 => b 3 you
    # 11 => b 3 me
    # 12 => c 1 you
    # 13 => c 1 me
    # 14 => c 2 you
    # 15 => c 2 me
    # 16 => c 3 you
    # 17 => c 3 me
[download]

The power of our abstraction is starting to become apparent. Normally we would need to nest three foreach loops (or use a module such as Algorithm::Loops) to do what we just did in one line of code. In the next section, we'll take this idea further to see how much boilerplate code we can factor out.

Abstractions that replace nested foreach loops

It's fairly common in day-to-day programming that we must process all of the combinations of elements that can be created by drawing one element from each of several sets (arrays). In the 2-array case, for example, we might need to test every element in one array with respect to every element in another. In optimization or search problems, we might need to examine each of the possibilities within an n-dimensional search space to determine which have the least cost or satisfy the given criteria.

One common way to approach such a problem is with nested loops. Consider the case involving three arrays:

    my (@alist, @blist, @clist);
    # ... initialize arrays with values ...

    foreach my $a (@alist) {
        foreach my $b (@blist) {
            foreach my $c (@clist) {
                # do something with ($a, $b, $c) ...
            }
        }
    }
[download]

Let's try to do the same thing with sequences.

First, we'll need to convert each array into a sequence. That's easy; we can just use seq. Then, we must combine the sequences; for this we'll use seq_prod, like before. Finally, we'll extract the values from the combined sequence and process them in turn. Using this recipe, we get the following:

    my $combined_sequence =
        seq_prod( seq(@alist), seq(@blist), seq(@clist) );

    while ( my ($a, $b, $c) = $combined_sequence->() ) {
        # do something with ($a, $b, $c)
    }
[download]

That works, but it's clunky.

Let's see if we can refine the approach. As the first refinement, let's create helper functions to perform the first two steps, in effect creating a combined sequence from a specification of its subsequences:

    sub seqs {
        map seq(@$_), @_;
    }

    sub seq_from_spec {
        seq_prod( seqs(@_) );
    }
[download]

Both helpers are tiny but handy and have many uses beyond the one we're aiming for now. For example, we can use seq_from_spec to extract the digits of n-ary numbers:

    sub nary_digits {
        my ($base, $digits) = @_;
        seq_from_spec( ([0..$base-1]) x $digits );
    }

    enumerate( nary_digits(2, 3) );  # 3-digit binary numbers

    #  0 => 0 0 0
    #  1 => 0 0 1
    #  2 => 0 1 0
    #  3 => 0 1 1
    #  4 => 1 0 0
    #  5 => 1 0 1
    #  6 => 1 1 0
    #  7 => 1 1 1
[download]

On a naming note, I called the function seq_from_spec instead of something like seq_prod_from_spec because I consider the Cartesian product to be the most fundamental and useful way of combining sets. So the idea of a "sequence specification" that describes a product of sequences naturally follows:

    # seq_from_spec([1..3])       === seq(1..3)
    # seq_from_spec([1..3],[4,5]) === seq(1..3) x seq(4,5)
    # seq_from_spec(\(@a,@b,...)) === seq(@a) x seq(@b) x ...
[download]

Back to our task, another helper factors out the looping boilerplate:

    sub seq_foreach {
        my ($seq, $fn) = @_;
        while (my @val = $seq->()) {
            $fn->(@val);
        }
        $seq;
    }
[download]

We can use it like so:

    $abcees->seq_foreach( sub { print "@_\n"; } );
    # a
    # b
    # c
[download]

Finally, a higher-level helper ties the solution together:

    sub seq_foreach_from_spec {
        my ($spec, $fn) = @_;
        seq_foreach( seq_from_spec( @$spec ), $fn );
    }
[download]

Now we can replace our original, 3-foreach loop with the following:

    seq_foreach_from_spec( [\(@alist, @blist, @clist)], sub {
        my ($a, $b, $c) = @_;
        # do something with $a, $b, $c, ...
    });
[download]

In reflection, that may seem like a long way to have gone just to replace a 3-foreach loop, and it was. But our travels covered more ground than might at first be obvious:

The exact same helper can be used to factor out nested loops of any fixed depth, not just 3.
The same helper can also handle cases when the depth is varying and unknown in advance. In the same way that our nary_digits function handles arbitrary digit lengths at run time, so does our helper function handle arbitrary numbers of input arrays.
We have captured useful idioms. Our helper functions add richness to our our programming vocabularies and let us think in higher-level terms such as Cartesian products instead of lower-level terms such as loops.
We ain't done yet. In the same way that we combined tiny functions to get us this far, we can combine these functions with others to yield even more powerful tools.

Transforming sequences: filters and maps

Sometimes the best approach to solving a problem is to solve a simpler problem and then transform the simple solution into one that works for the original problem. Let's say that we want to create a series of odd integers. We might approach it like so:

Generate a series of all integers. (Solve a simpler problem.)
Filter the series so that only odd integers remain. (Transform the solution.)

To implement this strategy, let's first create a helper function that applies a filtering transformation to a sequence to yield a new, filtered sequence:

    sub seq_filter {
        my ($seq, $filter_fn) = @_;
        seqsub {
            my @val;
            1 while @val = $seq->() and !$filter_fn->(@val);
            return @val;
        }
    }
[download]

The function is grep for sequences. It takes a sequence and a filtering function and returns a new sequence that passes through the values of the original sequence for which the filtering function returns true; all other values are filtered out. Using it, we can construct our odd-integers solution:

    sub odds_up_to {
        my $maximum = shift;
        seq( 1 .. $maximum )
            ->seq_filter( sub { $_[0] % 2 } )
    }

    enumerate( odds_up_to(10) );

    #  0 => 1
    #  1 => 3
    #  2 => 5
    #  3 => 7
    #  4 => 9
[download]

Now let's say that we want to generate a similar sequence but for even integers. Again, we can use a transformational strategy. This time, however, we'll transform the odd-integer series into an even-integer series by subtracting one from each value. In effect, we're mapping one series to another. Our helper is named accordingly:

    sub seq_map {
        my ($seq, $fn) = @_;
        seqsub {
            my @val = $seq->();
            @val ? $fn->(@val) : ();
        }
    }
[download]

And our even-integers solution:

    sub evens_up_to {
        odds_up_to( $_[0] + 1 )
            ->seq_map( sub { $_[0] - 1 } );
    }

    enumerate( evens_up_to(10) );

    #  0 => 0
    #  1 => 2
    #  2 => 4
    #  3 => 6
    #  4 => 8
    #  5 => 10
[download]

With these simple extensions to our vocabulary, we have greatly expanded the usefulness of our mini-language for sequences. Now let's try it out on a real-world problem.

SoPW Example: "Combinations of an array of arrays...?"

Take a moment to look at the problem posed in Combinations of an array of arrays...? and the solutions offered by our good fellow monks. To summarize, the seeker writes the following:

What I have is 7 arrays with arbitrary number of elements in each. I would like to create a new array that contains combinations of the other seven in this manner: (1) Only one element from each of the 7 arrays. (2) Minimum of 4 elements per element in final array.

With our existing sequence language, the solution for part (1) of the seeker's request is straightforward: Just pass the 7 arrays to seq_from_spec and the resulting sequence will yield all of the combinations.

Part (2) adds a wrinkle, however. It requires that the output combinations each have at least 4 elements. This suggests that part (1) really means, zero or one element from each array. (Otherwise, the at-least-4 constraint is meaningless because all of the combinations will have exactly 7 elements.)

To effect the zero-or-one behavior, we can transform each input array like [1,2,3] into [[],[1],[2],[3]]. Each element in the transformed array is a zero- or one-element array. (Note the insertion of an an "empty" element at the head of the array.) Next, we can pass these transformed arrays to seq_from_spec, as usual. On the backside, we can map the combinations back into the desired form by merging the zero- or one-element arrays. This we can do with a map.

Finally, we'll filter the combinations so that only those of the desired minimum length are kept.

Putting it all together in the form of a generalized solution:

    sub min_length_combinations {
        my ($min_length, @inputs) = @_;
        my @input_spec = map [ [], (map [$_], @$_) ], @inputs;
        seq_from_spec( @input_spec )
            ->seq_map( sub { [ map @$_, @_ ] } )
            ->seq_filter( sub { @{$_[0]} >= $min_length } )
    }
[download]

(BTW, this code is the Perl equivalent of the Haskell solution that I posted in the thread.)

Now, we can solve the example problem from the seeker's original post:

    min_length_combinations(
        4, map [split//], qw( abc de fgh i jk l m )
    )->enumerate;

    #   0 => ['i','j','l','m']
    #   1 => ['i','k','l','m']
    #   2 => ['f','j','l','m']
    #   3 => ['f','k','l','m']
    #
    #    ...
    #
    # 862 => ['c','e','h','i','k']
    # 863 => ['c','e','h','i','k','m']
    # 864 => ['c','e','h','i','k','l']
    # 865 => ['c','e','h','i','k','l','m']
[download]

More ways to combine sequences: series and parallel

In addition to combining sequences by way of Cartesian products, we have other options. We can join them in series or merge them in parallel. The following function performs the former:

    sub seq_series {
        my $seqs = seq( @_ );  # seq of seqs (!)
        my $seq;
        seqsub {
            my @val;
            do {
                ($seq) = $seqs->() unless $seq;
                @val = $seq->() if $seq;                
            } while !@val && ($seq = $seqs->());
            @val;
        }
    }

    seq_series( $abcees, $one_two_threes )->enumerate;

    #  0 => a
    #  1 => b
    #  2 => c
    #  3 => 1
    #  4 => 2
    #  5 => 3
[download]

Merging sequences in parallel is like joining the teeth of a coat's zipper. As input we have two (or more) separate sequences and as output we have one zipped-together sequence. In keeping with Haskell tradition, our zipper function will let us merge sequences of unequal lengths. In that case, the zipped sequence will be as long as the shortest input sequence. Here's the code:

    sub seq_reset {
        my $seq = shift;
        if ($seq) { 
            1 while $seq->();
        }
        $seq;
    }

    sub seq_zip {
        my $seqs = seq( @_ ); # seq of seqs (!)
        my $seq_count = @_;
        seqsub {
            my @outvals;
            while (my $seq = $seqs->()) {
                if (my @val = $seq->()) {
                    push @outvals, @val;
                }
                else {
                    seq_reset( $seqs->() ) for 1 .. $seq_count;
                    seq_reset( $seqs );
                    return ();
                }
            }
            return @outvals;
        }
    }
[download]

The need to handle unequal lengths adds complexity to our code. In particular, the else clause handles this case and resets any partially-read sequences before returning. This ensures that our sequences' next customers don't get short changed by inheriting half-read sequences.

Some examples:

    seq_zip( $abcees, $one_two_threes )->enumerate;

    #  0 => a 1
    #  1 => b 2
    #  2 => c 3

    seq_zip( $abcees, $one_two_threes, $you_and_mees )
        ->enumerate;

    #  0 => a 1 you
    #  1 => b 2 me
[download]

To generalize our zipping options, we can zip sequences with a given "zipper" function:

    sub seq_zip_with {
        my $zipper_fn = shift;
        seq_map( seq_zip(@_), $zipper_fn );
    }
[download]

Some examples:

    # some math helpers

    sub sum     { reduce { $a + $b } @_ }
    sub product { reduce { $a * $b } @_ }

    seq_zip_with( \&sum, seq(1..3), seq(0..10) )->enumerate;

    #  0 => 1
    #  1 => 3
    #  2 => 5

    seq_zip_with( \&product, seq(1..5), seq(0..10), seq(2..8) )
        ->enumerate;

    #  0 => 0
    #  1 => 6
    #  2 => 24
    #  3 => 60
    #  4 => 120
[download]

With these new extensions to our mini-language for sequences, let's tackle another real Perl Monks problem.

SoPW Example: "Comparing two arrays and counting pairwise comparisons"

Examine the thread Comparing two arrays and counting pairwise comparisons.

The seeker has two arrays of strings and wants to operate on each combination of elements from the two. The operation to be performed is a pairwise "comparison" which amounts to counting the character pairs that are formed when each element is zipped with the other.

Since combinations and zips are part of our sequence mini-language, our implementation is straightforward:

    my @site1 = qw( AATKKM aatkkm );
    my @site2 = qw( GGGGGG gggggg );

    my %counts;

    seq_foreach_from_spec( [ \(@site1, @site2) ],
        sub {
            seq_foreach(
                seq_zip( ( map seq(split//), @_ ) ),
                sub { $counts{"@_"}++ }
            ) 
        }
    );

    print Dumper(\%counts), "\n";

    # { 'K G' => 2, 'A G' => 2, 'm g' => 1, 'a g' => 2,
    #   'A g' => 2, 'M G' => 1, 'k g' => 2, 'k G' => 2,
    #   'T G' => 1, 'a G' => 2, 'm G' => 1, 't G' => 1,
    #   'K g' => 2, 'M g' => 1, 't g' => 1, 'T g' => 1 }
[download]

Extracting values from sequences

Sometimes sequences are merely a means to an end. Sometimes we don't care about the sequences themselves but only the values they produce. So it makes sense to extend our language with functions that extract values from sequences.

We'll create two value extractors. The first is a general-purpose extractor that will capture each array value in a sequence as an arrayref. The second is for when we expect the sequence's values to be scalars. In this case, we don't need to wrap each value within an array and can just return it straight.

    sub seq_values {
        my $seq = shift;
        seq_values_scalar( seq_map( $seq, sub { [@_] } ) );
    }

    sub seq_values_scalar {
        my $seq = shift;
        my @values;
        seq_foreach( $seq, sub { push @values, @_ } );
        return @values;
    }
[download]

Some examples:

    print Dumper( [ seq(1..3)->seq_values ] ), "\n";
    # [[1],[2],[3]]

    print Dumper( [ seq(1..3)->seq_values_scalar ] ), "\n";
    # [1,2,3]
[download]

As a more realistic example, let's write a function to transpose a matrix, that is, exchange its rows and columns:

    sub matrix_transpose {
        my $rows  = shift;
        [ seq_values( seq_zip( seqs(@$rows) ) ) ];
    }

    my $matrix = [ [ 0, 1 ]
                 , [ 2, 3 ]
                 , [ 4, 5 ] ];

    print Dumper( matrix_transpose( $matrix ) ), "\n";
    # [ [0,2,4]
    # , [1,3,5] ]
[download]

Another way to extract values is to collapse a sequence into a single value by "folding" an accumulating function across the sequence's values. This is similar to what List::Util's reduce function does for arrays. Here's our implementation for sequences:

    sub seq_fold {
        my ($seq, $fn) = @_;
        my @accum = $seq->();
        while (@accum && (my @val = $seq->())) {
            @accum = $fn->(@accum, @val);
        }
        wantarray ? @accum : $accum[0];
    }
[download]

Continuing with our matrix theme, let's write a function that computes the dot product of two vectors. We multiply the vectors' elements pairwise, and then add the resulting products:

    sub dot_product {
        seq_zip_with( \&product, seqs(@_) )
            ->seq_fold( \&sum );
    }

    print dot_product( [1,1,1], [1,2,3] ), "\n";
    # 6
[download]

So ends Part 1

In this first meditation on sequences, we began with a simple, foundational definition and built upon it a small library of functions. These tiny functions form the core of a mini-language that moves us closer to our collections, where we can work more naturally. Instead of thinking in terms of loops and iteration variables, we can think in terms of products and zips and folds. Our language provides direct support for combining, transforming, iterating over, and extracting values from collections, which we represent as sequences.

So far, the ways in which we have combined sequences have been static. In the next part, we'll look at how we can place tiny functions in between the parts of our sequences to introduce dynamism. This simple extension brings many complex manipulations within reach. We'll look at some interesting applications and solve a few more Perl Monks problems. We might even referee a game of hyperdimensional tic-tac-toe.

Thanks for taking the time to read this meditation. If you have any criticisms or can think of any way to make my writing better, please let me know.

Cheers,
Tom

Tom Moertel : Blog / Talks / CPAN / LectroTest / PXSL / Coffee / Movie Rating Decoder

Back to Meditations