http://qs321.pair.com?node_id=405570

Working with collections of things is one of the most common chores in programming. In this meditation, we'll create a functional mini-language for sequences – more than iterators and yet not quite streams – and use it as a simple, abstract tool for working with collections. To gain familiarity with our tool, we'll answer some recent questions that were asked by Seekers of Perl Wisdom.

Since we haven't defined what a sequence is yet, let's do that now:

A sequence is a function that represents a finite series of values. When called, the function returns the next value in the series. The first call to the function after the series has been exhausted returns an empty array and resets the sequence back to its beginning.

A couple of points. First, sequences are cyclical. After you exhaust them, they start over again. Second, the values extracted from a sequence are arrays, not scalars. This allows sequences to return multiple values at a time. As we'll see later, these properties allow us to combine sequences in interesting ways.

Fundamental sequences

With our definition in mind, let's begin our implementation. Some preliminaries:

#!/usr/bin/perl use warnings; use strict;
Although most of the code we'll write is functional in nature, Perl's object-oriented syntax is often a convenient way to call functions. To make this syntax available, let's set up a package and write a couple of helper functions to promote functions into Sequences:
package Sequences; sub new { my ($proto, $seq) = @_; bless $seq, $proto; } sub seqsub(&) { Sequences->new(@_); }
The package declaration and the new subroutine are standard Perl OO fare. The seqsub function, however, is somewhat odd. We'll use it as an alternative syntax to Perl's normal sub when we want to to create functions (anonymous subroutines) that represent sequences.

To see how the set-up works, let's create our first sequence-making function, seq. It creates a sequence out of the series formed by its arguments:

sub seq { my ($i, $elems) = (0, \@_); seqsub { $i < @$elems ? ( $elems->[ $i++ ] ) : do { $i = 0; () }; } }

To see how it works, let's create simple, 3-element sequence and extract its values:

my $abcees = seq("a", "b", "c"); $abcees->(); # ("a") $abcees->(); # ("b") $abcees->(); # ("c") $abcees->(); # ( ) # ... the cycle repeats ...
Because we will often want to see what's "in" a sequence, let's create a function to enumerate a sequence's values:
use Data::Dumper; sub enumerate { local $Data::Dumper::Terse = 1; local $Data::Dumper::Indent = 0; my ($i, $seq) = (0, $_[0]); while (my @val = $seq->()) { @val = map { ref ($_) ? Dumper($_) : $_ } @val; printf "%2d => %s\n", $i++, "@val"; } $seq; }
The while loop within the function shows us the idiom for iterating over a sequence's values: We extract values until we get an empty array. We print each value we get. When we're done, we return the sequence itself to facilitate function chaining.

As an example, let's enumerate our earlier sequence:

enumerate( $abcees ); # 0 => a # 1 => b # 2 => c
Let's try the alternative OO syntax:
$abcees->enumerate; # 0 => a # 1 => b # 2 => c
Not bad. Now let's move on to something a little more weighty.

Combinations of sequences

One of the things that makes sequences interesting is that we can combine them in ways that let us eliminate a lot of boilerplate code. For example, we often want to do things for each element of one set and for each element of another. We typically use nested foreach loops for this cause, but sequences give us another option.

The following function takes two separate sequences s and t and combines them into a sequence whose values are the Cartesian product of the values drawn from s and t, in effect nesting t inside of s:

sub seq_prod2 { my ($s, $t) = @_; my @sval; seqsub { my @tval; while ( !@sval || !(@tval = $t->()) ) { return () unless @sval = $s->(); } ( @sval, @tval ); } };
An example:
my $one_two_threes = seq( 1 .. 3 ); enumerate( seq_prod2( $abcees, $one_two_threes ) ); # 0 => a 1 # 1 => a 2 # 2 => a 3 # 3 => b 1 # 4 => b 2 # 5 => b 3 # 6 => c 1 # 7 => c 2 # 8 => c 3
One of the idioms that appears frequently in functional programming is to take a binary function and generalize it into an n-ary function by "folding" it over a list of arguments. We can use List::Util's reduce function for this purpose. Let's use it generalize seq_prod2 into the n-ary seq_prod:
use List::Util qw( reduce ); sub seq_prod { reduce { seq_prod2($a,$b) } @_ ; }
Now we can compute the products of any number of sequences:
my $you_and_mees = seq( "you", "me" ); seq_prod( $abcees, $one_two_threes, $you_and_mees ) ->enumerate; # 0 => a 1 you # 1 => a 1 me # 2 => a 2 you # 3 => a 2 me # 4 => a 3 you # 5 => a 3 me # 6 => b 1 you # 7 => b 1 me # 8 => b 2 you # 9 => b 2 me # 10 => b 3 you # 11 => b 3 me # 12 => c 1 you # 13 => c 1 me # 14 => c 2 you # 15 => c 2 me # 16 => c 3 you # 17 => c 3 me
The power of our abstraction is starting to become apparent. Normally we would need to nest three foreach loops (or use a module such as Algorithm::Loops) to do what we just did in one line of code. In the next section, we'll take this idea further to see how much boilerplate code we can factor out.

Abstractions that replace nested foreach loops

It's fairly common in day-to-day programming that we must process all of the combinations of elements that can be created by drawing one element from each of several sets (arrays). In the 2-array case, for example, we might need to test every element in one array with respect to every element in another. In optimization or search problems, we might need to examine each of the possibilities within an n-dimensional search space to determine which have the least cost or satisfy the given criteria.

One common way to approach such a problem is with nested loops. Consider the case involving three arrays:

my (@alist, @blist, @clist); # ... initialize arrays with values ... foreach my $a (@alist) { foreach my $b (@blist) { foreach my $c (@clist) { # do something with ($a, $b, $c) ... } } }
Let's try to do the same thing with sequences.

First, we'll need to convert each array into a sequence. That's easy; we can just use seq. Then, we must combine the sequences; for this we'll use seq_prod, like before. Finally, we'll extract the values from the combined sequence and process them in turn. Using this recipe, we get the following:

my $combined_sequence = seq_prod( seq(@alist), seq(@blist), seq(@clist) ); while ( my ($a, $b, $c) = $combined_sequence->() ) { # do something with ($a, $b, $c) }
That works, but it's clunky.

Let's see if we can refine the approach. As the first refinement, let's create helper functions to perform the first two steps, in effect creating a combined sequence from a specification of its subsequences:

sub seqs { map seq(@$_), @_; } sub seq_from_spec { seq_prod( seqs(@_) ); }
Both helpers are tiny but handy and have many uses beyond the one we're aiming for now. For example, we can use seq_from_spec to extract the digits of n-ary numbers:
sub nary_digits { my ($base, $digits) = @_; seq_from_spec( ([0..$base-1]) x $digits ); } enumerate( nary_digits(2, 3) ); # 3-digit binary numbers # 0 => 0 0 0 # 1 => 0 0 1 # 2 => 0 1 0 # 3 => 0 1 1 # 4 => 1 0 0 # 5 => 1 0 1 # 6 => 1 1 0 # 7 => 1 1 1

On a naming note, I called the function seq_from_spec instead of something like seq_prod_from_spec because I consider the Cartesian product to be the most fundamental and useful way of combining sets. So the idea of a "sequence specification" that describes a product of sequences naturally follows:

# seq_from_spec([1..3]) === seq(1..3) # seq_from_spec([1..3],[4,5]) === seq(1..3) x seq(4,5) # seq_from_spec(\(@a,@b,...)) === seq(@a) x seq(@b) x ...

Back to our task, another helper factors out the looping boilerplate:

sub seq_foreach { my ($seq, $fn) = @_; while (my @val = $seq->()) { $fn->(@val); } $seq; }
We can use it like so:
$abcees->seq_foreach( sub { print "@_\n"; } ); # a # b # c
Finally, a higher-level helper ties the solution together:
sub seq_foreach_from_spec { my ($spec, $fn) = @_; seq_foreach( seq_from_spec( @$spec ), $fn ); }
Now we can replace our original, 3-foreach loop with the following:
seq_foreach_from_spec( [\(@alist, @blist, @clist)], sub { my ($a, $b, $c) = @_; # do something with $a, $b, $c, ... });
In reflection, that may seem like a long way to have gone just to replace a 3-foreach loop, and it was. But our travels covered more ground than might at first be obvious:
  1. The exact same helper can be used to factor out nested loops of any fixed depth, not just 3.
  2. The same helper can also handle cases when the depth is varying and unknown in advance. In the same way that our nary_digits function handles arbitrary digit lengths at run time, so does our helper function handle arbitrary numbers of input arrays.
  3. We have captured useful idioms. Our helper functions add richness to our our programming vocabularies and let us think in higher-level terms such as Cartesian products instead of lower-level terms such as loops.
  4. We ain't done yet. In the same way that we combined tiny functions to get us this far, we can combine these functions with others to yield even more powerful tools.

Transforming sequences: filters and maps

Sometimes the best approach to solving a problem is to solve a simpler problem and then transform the simple solution into one that works for the original problem. Let's say that we want to create a series of odd integers. We might approach it like so:
  1. Generate a series of all integers. (Solve a simpler problem.)
  2. Filter the series so that only odd integers remain. (Transform the solution.)
To implement this strategy, let's first create a helper function that applies a filtering transformation to a sequence to yield a new, filtered sequence:
sub seq_filter { my ($seq, $filter_fn) = @_; seqsub { my @val; 1 while @val = $seq->() and !$filter_fn->(@val); return @val; } }
The function is grep for sequences. It takes a sequence and a filtering function and returns a new sequence that passes through the values of the original sequence for which the filtering function returns true; all other values are filtered out. Using it, we can construct our odd-integers solution:
sub odds_up_to { my $maximum = shift; seq( 1 .. $maximum ) ->seq_filter( sub { $_[0] % 2 } ) } enumerate( odds_up_to(10) ); # 0 => 1 # 1 => 3 # 2 => 5 # 3 => 7 # 4 => 9
Now let's say that we want to generate a similar sequence but for even integers. Again, we can use a transformational strategy. This time, however, we'll transform the odd-integer series into an even-integer series by subtracting one from each value. In effect, we're mapping one series to another. Our helper is named accordingly:
sub seq_map { my ($seq, $fn) = @_; seqsub { my @val = $seq->(); @val ? $fn->(@val) : (); } }
And our even-integers solution:
sub evens_up_to { odds_up_to( $_[0] + 1 ) ->seq_map( sub { $_[0] - 1 } ); } enumerate( evens_up_to(10) ); # 0 => 0 # 1 => 2 # 2 => 4 # 3 => 6 # 4 => 8 # 5 => 10
With these simple extensions to our vocabulary, we have greatly expanded the usefulness of our mini-language for sequences. Now let's try it out on a real-world problem.

SoPW Example: "Combinations of an array of arrays...?"

Take a moment to look at the problem posed in Combinations of an array of arrays...? and the solutions offered by our good fellow monks. To summarize, the seeker writes the following:
What I have is 7 arrays with arbitrary number of elements in each. I would like to create a new array that contains combinations of the other seven in this manner: (1) Only one element from each of the 7 arrays. (2) Minimum of 4 elements per element in final array.
With our existing sequence language, the solution for part (1) of the seeker's request is straightforward: Just pass the 7 arrays to seq_from_spec and the resulting sequence will yield all of the combinations.

Part (2) adds a wrinkle, however. It requires that the output combinations each have at least 4 elements. This suggests that part (1) really means, zero or one element from each array. (Otherwise, the at-least-4 constraint is meaningless because all of the combinations will have exactly 7 elements.)

To effect the zero-or-one behavior, we can transform each input array like [1,2,3] into [[],[1],[2],[3]]. Each element in the transformed array is a zero- or one-element array. (Note the insertion of an an "empty" element at the head of the array.) Next, we can pass these transformed arrays to seq_from_spec, as usual. On the backside, we can map the combinations back into the desired form by merging the zero- or one-element arrays. This we can do with a map.

Finally, we'll filter the combinations so that only those of the desired minimum length are kept.

Putting it all together in the form of a generalized solution:

sub min_length_combinations { my ($min_length, @inputs) = @_; my @input_spec = map [ [], (map [$_], @$_) ], @inputs; seq_from_spec( @input_spec ) ->seq_map( sub { [ map @$_, @_ ] } ) ->seq_filter( sub { @{$_[0]} >= $min_length } ) }
(BTW, this code is the Perl equivalent of the Haskell solution that I posted in the thread.)

Now, we can solve the example problem from the seeker's original post:

min_length_combinations( 4, map [split//], qw( abc de fgh i jk l m ) )->enumerate; # 0 => ['i','j','l','m'] # 1 => ['i','k','l','m'] # 2 => ['f','j','l','m'] # 3 => ['f','k','l','m'] # # ... # # 862 => ['c','e','h','i','k'] # 863 => ['c','e','h','i','k','m'] # 864 => ['c','e','h','i','k','l'] # 865 => ['c','e','h','i','k','l','m']

More ways to combine sequences: series and parallel

In addition to combining sequences by way of Cartesian products, we have other options. We can join them in series or merge them in parallel. The following function performs the former:
sub seq_series { my $seqs = seq( @_ ); # seq of seqs (!) my $seq; seqsub { my @val; do { ($seq) = $seqs->() unless $seq; @val = $seq->() if $seq; } while !@val && ($seq = $seqs->()); @val; } } seq_series( $abcees, $one_two_threes )->enumerate; # 0 => a # 1 => b # 2 => c # 3 => 1 # 4 => 2 # 5 => 3
Merging sequences in parallel is like joining the teeth of a coat's zipper. As input we have two (or more) separate sequences and as output we have one zipped-together sequence. In keeping with Haskell tradition, our zipper function will let us merge sequences of unequal lengths. In that case, the zipped sequence will be as long as the shortest input sequence. Here's the code:
sub seq_reset { my $seq = shift; if ($seq) { 1 while $seq->(); } $seq; } sub seq_zip { my $seqs = seq( @_ ); # seq of seqs (!) my $seq_count = @_; seqsub { my @outvals; while (my $seq = $seqs->()) { if (my @val = $seq->()) { push @outvals, @val; } else { seq_reset( $seqs->() ) for 1 .. $seq_count; seq_reset( $seqs ); return (); } } return @outvals; } }
The need to handle unequal lengths adds complexity to our code. In particular, the else clause handles this case and resets any partially-read sequences before returning. This ensures that our sequences' next customers don't get short changed by inheriting half-read sequences.

Some examples:

seq_zip( $abcees, $one_two_threes )->enumerate; # 0 => a 1 # 1 => b 2 # 2 => c 3 seq_zip( $abcees, $one_two_threes, $you_and_mees ) ->enumerate; # 0 => a 1 you # 1 => b 2 me
To generalize our zipping options, we can zip sequences with a given "zipper" function:
sub seq_zip_with { my $zipper_fn = shift; seq_map( seq_zip(@_), $zipper_fn ); }
Some examples:
# some math helpers sub sum { reduce { $a + $b } @_ } sub product { reduce { $a * $b } @_ } seq_zip_with( \&sum, seq(1..3), seq(0..10) )->enumerate; # 0 => 1 # 1 => 3 # 2 => 5 seq_zip_with( \&product, seq(1..5), seq(0..10), seq(2..8) ) ->enumerate; # 0 => 0 # 1 => 6 # 2 => 24 # 3 => 60 # 4 => 120
With these new extensions to our mini-language for sequences, let's tackle another real Perl Monks problem.

SoPW Example: "Comparing two arrays and counting pairwise comparisons"

Examine the thread Comparing two arrays and counting pairwise comparisons.

The seeker has two arrays of strings and wants to operate on each combination of elements from the two. The operation to be performed is a pairwise "comparison" which amounts to counting the character pairs that are formed when each element is zipped with the other.

Since combinations and zips are part of our sequence mini-language, our implementation is straightforward:

my @site1 = qw( AATKKM aatkkm ); my @site2 = qw( GGGGGG gggggg ); my %counts; seq_foreach_from_spec( [ \(@site1, @site2) ], sub { seq_foreach( seq_zip( ( map seq(split//), @_ ) ), sub { $counts{"@_"}++ } ) } ); print Dumper(\%counts), "\n"; # { 'K G' => 2, 'A G' => 2, 'm g' => 1, 'a g' => 2, # 'A g' => 2, 'M G' => 1, 'k g' => 2, 'k G' => 2, # 'T G' => 1, 'a G' => 2, 'm G' => 1, 't G' => 1, # 'K g' => 2, 'M g' => 1, 't g' => 1, 'T g' => 1 }

Extracting values from sequences

Sometimes sequences are merely a means to an end. Sometimes we don't care about the sequences themselves but only the values they produce. So it makes sense to extend our language with functions that extract values from sequences.

We'll create two value extractors. The first is a general-purpose extractor that will capture each array value in a sequence as an arrayref. The second is for when we expect the sequence's values to be scalars. In this case, we don't need to wrap each value within an array and can just return it straight.

sub seq_values { my $seq = shift; seq_values_scalar( seq_map( $seq, sub { [@_] } ) ); } sub seq_values_scalar { my $seq = shift; my @values; seq_foreach( $seq, sub { push @values, @_ } ); return @values; }
Some examples:
print Dumper( [ seq(1..3)->seq_values ] ), "\n"; # [[1],[2],[3]] print Dumper( [ seq(1..3)->seq_values_scalar ] ), "\n"; # [1,2,3]
As a more realistic example, let's write a function to transpose a matrix, that is, exchange its rows and columns:
sub matrix_transpose { my $rows = shift; [ seq_values( seq_zip( seqs(@$rows) ) ) ]; } my $matrix = [ [ 0, 1 ] , [ 2, 3 ] , [ 4, 5 ] ]; print Dumper( matrix_transpose( $matrix ) ), "\n"; # [ [0,2,4] # , [1,3,5] ]

Another way to extract values is to collapse a sequence into a single value by "folding" an accumulating function across the sequence's values. This is similar to what List::Util's reduce function does for arrays. Here's our implementation for sequences:

sub seq_fold { my ($seq, $fn) = @_; my @accum = $seq->(); while (@accum && (my @val = $seq->())) { @accum = $fn->(@accum, @val); } wantarray ? @accum : $accum[0]; }
Continuing with our matrix theme, let's write a function that computes the dot product of two vectors. We multiply the vectors' elements pairwise, and then add the resulting products:
sub dot_product { seq_zip_with( \&product, seqs(@_) ) ->seq_fold( \&sum ); } print dot_product( [1,1,1], [1,2,3] ), "\n"; # 6

So ends Part 1

In this first meditation on sequences, we began with a simple, foundational definition and built upon it a small library of functions. These tiny functions form the core of a mini-language that moves us closer to our collections, where we can work more naturally. Instead of thinking in terms of loops and iteration variables, we can think in terms of products and zips and folds. Our language provides direct support for combining, transforming, iterating over, and extracting values from collections, which we represent as sequences.

So far, the ways in which we have combined sequences have been static. In the next part, we'll look at how we can place tiny functions in between the parts of our sequences to introduce dynamism. This simple extension brings many complex manipulations within reach. We'll look at some interesting applications and solve a few more Perl Monks problems. We might even referee a game of hyperdimensional tic-tac-toe.

Thanks for taking the time to read this meditation. If you have any criticisms or can think of any way to make my writing better, please let me know.

Cheers,
Tom

Replies are listed 'Best First'.
Re: A mini-language for sequences (part 1)
by BrowserUk (Patriarch) on Nov 05, 2004 at 19:21 UTC

    Tom, this isn't critism (yet:). Your's is a long article that will take several passes to fully appreciate. But.. :)

    I recognised this problem. Now comparing your code with my own solution which both arrive at the same results:

    my @site1 = qw( AATKKM aatkkm ); my @site2 = qw( GGGGGG gggggg ); my %counts; seq_foreach_from_spec( [ \(@site1, @site2) ], sub { seq_foreach( seq_zip( ( map seq(split//), @_ ) ), sub { $counts{"@_"}++ } ) } ); print Dumper(\%counts), "\n";

    And

    my @site1 = qw( AATKKM aatkkm ); my @site2 = qw( GGGGGG gggggg ); my %counts; for my $site1 ( @site1 ){ for my $site2 ( @site2 ) { $counts{ substr( $site1, $_, 1 ) . substr( $site2, $_, 1 ) + }++ for 0 .. min( length( $site1 )-1, length( $site2 )-1 ) +; } } print Dumper \%counts;

    The thing that struck me was that my perl version would be reasonably clear to most people with more than a passing familiarity with perl. Whereas your version...?

    Apart from the need to go off looking up the purpose, and parameters of seq_foreach_from_spec(), seq_foreach() and seq_zip(). (Oh! and seq(). Nearly missed that tucked away in there). That "one line" construction? Phew!

    I'm not adverse to terse or complicated code, but anonymous subs and user subs nested inside user subs; nested inside anonymous subs, inside a user sub along with an anon. array of array references.

    When I encountered that line, the only way I could begin to makes sense of it was to deconstruct the one line into 3 for 4 using temporaries and it still left me cold? Maybe if I work through the stuff again it'll make it easier.

    But given that it almost exactly the same number of lines/bytes of code; I doubt it is any quicker; certainly isn't clearer; what, besides novelty value and a cool implementation of FP, is the point?

    Please don't misread that. It is a question, not a statement. I keep reading about FP, and have now installed 3 different FP langauges and downloaded 4 FP/langauge books trying to get what people are raving about.

    So far, FP is interesting, and in some ways quite elegant, but it is also way over sold in some aspects. And I have to say, beyond encouraging me to continue to make full use of map, grep and List::Util where they fit, which I've taken some critisism for in the past, I don't quite get the idea of going further with FP in Perl?


    Examine what is said, not who speaks.
    "Efficiency is intelligent laziness." -David Dunham
    "Think for yourself!" - Abigail
    "Memory, processor, disk in that order on the hardware side. Algorithm, algorithm, algorithm on the code side." - tachyon
      First, thanks for your comments and your questions.

      Regarding the general thrust of your questioning, which I believe can be summarized, Why do all this crazy FP stuff when the equivalent "normal" Perl code is perfectly reasonable and probably easier for most Perl programmers to understand? I have a couple of answers.

      First, don't take my examples as the poster children for functional programming. This is a meditation, and a tutorial-style one at that. The examples are slanted towards meditation – mind stretching calisthenics and all that monastic stuff – not practicality. The point of the examples is not to say, This is the way to do it, but rather, Meditate on this code. The journey is the reward. (Further, my code isn't truly functional. It's a Perl-functional hybrid that nobody ought to mistake for real functional programming. Consider it "functionally inspired.")

      Second, after studying the code and understanding the thinking behind it, that's when you can see the benefits. The benefits aren't so much visible in my meditation's examples as they are upon reflection on the spectrum of problems that can be solved using the exact same techniques. Even though the shape of the problems may change, the solutions stay the same. The traditional approach, in contrast, is effectively hard-coded for one shape.

      Consider the code snippets that you compared in your post. Both operate on two input arrays @site1 and @site2. That's the shape of this particular problem. But what if the problem isn't static? What if we're given a list of arrays to process? The shape changes. The code you wrote embodies the static structure of the 2-array problem. Add an array, and it breaks. The outer pair of for loops is a hard-coded solution for the 2-array case. The fp-inspired approach that I show is independent of the shape of the problem. Instead of passing in \(@site1,@site2) you can pass in any number of input arrays. And the same code works, as is.

      Now, in real life, we usually know the shapes of our problems ahead of time. So what does the shape-flexing capability buy us? It buys the ability to code at a higher level and, more importantly, to think at a higher level, which gets to the heart of your question below:

      The thing that struck me was that my perl version would be reasonably clear to most people with more than a passing familiarity with perl. Whereas your version...?

      The reason you perceive the fp-inspired code as hard to follow is because your brain hasn't "internalized" the meaning of products and zips and the rest of our new vocabulary. To you, these words probably seem like gratuitous functional jargon.

      But the words have meaning. When I saw the example problem as first posted by the Seeker of Perl Wisdom, I immediately thought, "Ah, he wants the count of each value from the zipped products." And the code immediately followed.

      That's the reward. To have the right words to describe the problem. To have more ways of stringing together ideas. To reduce the gulfs between problem and understanding, understanding and code.

      If this sounds like worthless babble, that's OK. I didn't appreciate functional programming until I had used it to solve real-world sized problems for over a year. Only then could I "feel" fp in my bones and appreciate its potential.

      So far, FP is interesting, and in some ways quite elegant, but it is also way over sold in some aspects.
      Have you read Why Functional Programming Matters? If not, read it.

      I love Perl Monks, but don't look at the fp examples on Perl Monks as your window into the fp world. Take a look at the stuff going on in the Haskell Communities Report. Read some of the papers from the ICFP proceedings. That's where you'll find the good stuff.

      ... I don't quite get the idea of going further with FP in Perl?
      Perl leaves much to be desired as a functional programming language. Not having currying, pattern matching, proper tail recursion, and real garbage collection are limiting, painful even. But Perl is much better for fp than most other mainstream languages. For that, I'm thankful.

      Is it worth the effort to use functional-programming ideas with Perl? Part of my motivation for this meditation was to explore that question. I wanted to take some common fp ideas and run with them in Perl. How far can we go? What's painful? What's beautiful?

      My belief is that fp is a useful and even powerful way to program in Perl. But Perl isn't Haskell or O'Caml, and some things are better off left to the old ways. I think the most practical approach, then, is a functionally inspired style for Perl.

      Thanks again for your thoughtful response to my meditation.

      Cheers,
      Tom

        Okay, the title was meant to be inflammatory. Don't read too much into it.

        Perl leaves much to be desired as a functional programming language. Not having currying, pattern matching, proper tail recursion, and real garbage collection are limiting, painful even.

        What frustrates me most about Perl (from a FP perspective, at least) is the distinction between builtins and user functions. If anyone can tell me how to get around that without writing sub {$_[0] + $_[1]} everywhere, I'd be most grateful.

        Update: I realize that this complaint is a bit vague. What I really want to do is take coderefs to builtins and operators and pass them to higher-order functions. To the best of my knowledge, this is not an easy thing to do.

        --
        Yours in pedantry,
        F o x t r o t U n i f o r m

        "Anything you put in comments is not tested and easily goes out of date." -- tye

        The following are some (lightly edited :) initial reactions to your follow post.

        Note: Initial.

        There are some positive reactions lower down, but I thought that I would get these out of the way first in the hope that the later reactions would erase any bitter taste these might leave.

        1. I never said "this crazy FP stuff".
        2. I get the FP -v- FP-inspired difference.
        3. Phrases like "spectrum of problems", "shape of the problems", "shape-flexing capability", "code at a higher level" & "think at a higher level" all leave me...um...suspicious.

          What is he trying to conceal with flowery language?

          Put that down to having suffered the hyperbole of too many "new programming paradyms". (I realise FP isn't new.)

        4. "The reason you perceive the fp-inspired code as hard to follow is because your brain hasn't "internalized" the meaning of products and zips and the rest of our new vocabulary. To you, these words probably seem like gratuitous functional jargon."

          My question is: "Do I need this new vocabulary?".

        5. "But the words have meaning. When I saw the example problem as first posted by the Seeker of Perl Wisdom, I immediately thought, "Ah, he wants the count of each value from the zipped products." And the code immediately followed."

          Fair enough, but then you had to construct a fairly complicated library of intrastructure in order to be able to code the solution in terms of your vocabulary.

          When I saw the example problem, I thought: "He want's to iterate over the two arrays of strings and count the unique pairs of aligned characters".

          1. "...iterate over the two arrays..."

            Nested for loops.

          2. "...count the unique pairs...".

            Increment a hash.

          3. "...pairs of aligned characters."

            Two substrs and another for loop.

          The code I posted fell straight out of that. No need for any auxillary vocabulary. No need to code a bunch of extra infrastructure. Perl already has all the vocabulary and in-built infrastructure required.

        6. "That's the reward. To have the right words to describe the problem. To have more ways of stringing together ideas. To reduce the gulfs between problem and understanding, understanding and code." & "If this sounds like worthless babble, that's OK."

          {nods} Yes. It does a bit :)

        7. I didn't appreciate functional programming until I had used it to solve real-world sized problems for over a year. Only then could I "feel" fp in my bones and appreciate its potential.

          I've been trying to get into FP since I first starting reading articles on Lisp in Byte magazine circa. 1979/80. I currently have Common Lisp, Scheme, and most recently Haskell.

        8. Have you read Why Functional Programming Matters? If not, read it. Now."

          I pulled the PDF and tried to read it. After a little while it began to seem familiar. Maybe it appeared on Byte, or CoACM or something else I've read a long time ago? Maybe it was referenced from such an article and I tracked it down and tried to read it? Maybe it's just the code examples that seem familiar from some other FP tutorial I've read down the years? Whatever, the PDF sits open on my desktop and I will endevour to try and read it to the end, but I doubt that I made it that far last time either.

          The problem is that in common with many other FP (and, notoriously OO) tutorials, it selects very specifically limited and condusive (to their aims) examples. It then spends an inordinate amount of time concentrating on how to construct the solution--in it's own, very esoteric, terms. Your real world examples are infinitely better (IMO).

          But the single line that sums up my feelings of what I've seen of FP tutorials and advocacy comes from the Abstract on the page you linked to above.

          Since modularity is the key to successful programming, functional languages are vitally important to the real world.

          The problem with that statement is that it implies that the only way to achieve modularity, is through FP--which is clearly not the case.

        9. I love Perl Monks, but don't look at the fp examples on Perl Monks as your window into the fp world. Take a look at the stuff going on in the Haskell Communities Report. Read some of the papers from the ICFP proceedings. That's where you'll find the good stuff.

          As I mentioned, my FP exploration goes back a long way, and my research is definitly not confined to what I see on PM. My exploration of Haskell is very new (and did come about because of articles mentioning it here at PM), so thankyou for those links--they will give me many hours of good reading.

        I hope that doesn't all seem negative? It isn't meant to be. I'm continuing to enjoy reading your article and pursue various angles that lead from it. Thankyou for writing it and I look forward to Part 2.

        For me, the most significant thing that came out of your response to my initial post is:

        Instead of passing in \(@site1,@site2) you can pass in any number of input arrays. And the same code works, as is.

        Now that is a benefit that I can understand and internalise :).

        I did try to download your code and try this out, but there seem to be some bits missing to allow the examples to work? Perhaps in part 2. This is what I extracted from your OP: {**Code moved to bottom of post**} but I couldn't work out how to fix it up so that I could try out the example with 3 or more lists?

        Anyway, the concept of writing a generic solution to the nested loops problem is one that I've been trying to get to grips with for some time. The upshot of your article (for me) is that I've finally gotten around to putting some concerted effort into it, and I came up with a solution. It doesn't look much like FP in it's construction, but it does rely heavily of constructing anonymous subroutine iterators, and constructing lists of these. And combining these iterator functions to contruct a higher order iterator. So, having (probably) butchered the FP vocabulary to death :), this is what my solution looks like:

        #! perl -slw use strict; sub loops { my @iters = map{ my @list = ( @$_, undef ); sub { $list[ @list ] = shift @list || () }; } @_; my @rv = map{ $iters[ $_ ]() } 0 .. $#iters; return sub { my $rv = [ @rv ]; for my $i ( reverse 0 .. $#iters ) { $rv[ $i ] = $iters[ $i ]() and return $rv; $rv[ $i ] = $iters[ $i ](); } return; }; } my $iter = loops [ 'a' .. 'd' ], [ 1 .. 4 ], [ 'me', 'you' ]; print "@$_" while $_ = $iter->(); __END__ [13:51:21.14] P:\test>loops a 1 me a 1 you a 2 me a 2 you a 3 me a 3 you b 1 me b 1 you b 2 me b 2 you b 3 me b 3 you c 1 me c 1 you c 2 me c 2 you c 3 me

        I'm pretty pleased with that. Your article has been invaluable to me because it has prompted me to look at this problem in a completely different way to that in which I have been looking at it in the past. And it has triggered a lot of thoughts about using similar techniques to generalise several other pieces of code I have kicking around.

        So thankyou again for posting your very thought provoking article. I really look forward to part 2.

        I think the conclusion I'm drawing is that it's possible to learn and adopt techniques from other languages and styles of coding, without having to adopt the vocabularies and working practices wholesale. Once you understand (if I have?), the underlying techniques, it becomes possible to utilise them whilst retaining the flavour of the language in which your writing.


        The code I extracted from your OP. Moved to the bottom to avoid interupting the main post. (Wouldn;t it be nice if readmore tags only expanded when asked? Even if viewing the containing post directly).


        Examine what is said, not who speaks.
        "Efficiency is intelligent laziness." -David Dunham
        "Think for yourself!" - Abigail
        "Memory, processor, disk in that order on the hardware side. Algorithm, algorithm, algorithm on the code side." - tachyon
      I think the gain comes from two things:
      1. You can take an intermediate form of seq_foreach_from_spec() and pass it around, similar to lexical filehandles. (Remember how cool that was when 5.6 added it?)
      2. You can extend it a lot more easily than for-loops. In the trivial case, which is what you're describing, it's very easy to see how the for-for version is easier to handle than the FP version. What if you're working 5-10 nested loops? What if you're working a loop with a bunch of double-nested pairs, all of which are similar?

      Let's say that you can create an intermediate representation that allows you to pass in N lists and do XYZ to it. What XYZ does is irrelevant, except to say that you need it done to more than one group of lists. Think about it as templates for algorithms and I think you'll see the gain.

      Being right, does not endow the right to be rude; politeness costs nothing.
      Being unknowing, is not the same as being stupid.
      Expressing a contrary opinion, whether to the individual or the group, is more often a sign of deeper thought than of cantankerous belligerence.
      Do not mistake your goals as the only goals; your opinion as the only opinion; your confidence as correctness. Saying you know better is not the same as explaining you know better.

      One thing I forgot to point out in my previous response is that the example you cite, in particular, is crippled by the small size of our vocabulary (to date). Had we the space to discuss concatmap, the following implementation would have been available:
      seq_from_spec( \(@site1, @site2) ) ->seq_concatmap( sub { seq_zip(map seq(split//), @_) } ) ->seq_foreach( sub { $counts{"@_"}++ } );
      This implementation is much clearer (assuming you know concatmap).

      Concatmap combines concat and map; it maps a sequence to a sequence of sequences, whose values are then concatenated to yield the output sequence. For example:

      sub upto { 1 .. $_[0] } sub supto { seq(upto(@_)); } seq(1..3) -> seq_map(\&upto) -> enumerate; # 0 => 1 # 1 => 1 2 # 2 => 1 2 3 seq(1..3) -> seq_map(\&supto) -> seq_concat -> enumerate; # 0 => 1 # 1 => 1 # 2 => 2 # 3 => 1 # 4 => 2 # 5 => 3 seq(1..3) -> seq_concatmap(\&supto) -> enumerate; # 0 => 1 # 1 => 1 # 2 => 2 # 3 => 1 # 4 => 2 # 5 => 3

      Cheers,
      Tom

Re: A mini-language for sequences (part 1)
by sleepingsquirrel (Chaplain) on Nov 05, 2004 at 19:18 UTC
    On a somewhat related subject, I found Towards the best collection traversal interface an interesting read...
    Most programming languages support collections, represented by an in-memory data structure, a file, a database, or a generating function. A programming language system gives us typically one of the two interfaces to systematically access elements of a collection. One traversal API is based on enumerators -- e.g., for-each, map, filter higher-order procedures -- of which the most general is fold. The second approach relies on streams, a.k.a. cursors, lazy lists.


    -- All code is 100% tested and functional unless otherwise noted.
•Re: A mini-language for sequences (part 1)
by merlyn (Sage) on Nov 05, 2004 at 18:26 UTC
Re: A mini-language for sequences (part 1)
by itub (Priest) on Nov 05, 2004 at 20:48 UTC
    I feel an almost irresistible urge to overload the x operator to the seq_prod2 function:

    use overload x => 'seq_prod2'; $s1 = seq(1,2,3); $s2 = seq(qw(a b c)); $p = $s1 x $s2; # same as seq_prod2($s1, $s2);

      Can anyone say "Cartesian Product"?

Re: A mini-language for sequences (part 1)
by dragonchild (Archbishop) on Nov 05, 2004 at 18:41 UTC
    If something like this isn't on CPAN, you should seriously consider putting it on there, along with this meditation as the POD. Excellent work! ++

    Being right, does not endow the right to be rude; politeness costs nothing.
    Being unknowing, is not the same as being stupid.
    Expressing a contrary opinion, whether to the individual or the group, is more often a sign of deeper thought than of cantankerous belligerence.
    Do not mistake your goals as the only goals; your opinion as the only opinion; your confidence as correctness. Saying you know better is not the same as explaining you know better.

      If something like this isn't on CPAN, you should seriously consider putting it on there

      I second that, although I would suggest better naming conventions. I am not a fan of abbreviations, and especially ones that are so similar and potentially confusing to the eye (i.e. - seq, spec).

      -stvn

      CPAN already has Set::CrossProduct, Object::Iterate, and Tie::Cycle. I think those three modules cover just about everything discussed in the node, and anything they don't I can add.

      --
      brian d foy <bdfoy@cpan.org>
        brian, thanks for the pointers!
        CPAN already has Set::CrossProduct, Object::Iterate, and Tie::Cycle. I think those three modules cover just about everything discussed in the node, and anything they don't I can add.
        While the overlap of my meditation's sequences and your modules is large, I would like to point out some philosophical differences between our approaches. (These aren't pros or cons – just differences.)
        1. The sets that sequences represent are implicit. This lets them be defined on the fly so that, for example, you can efficiently take the Cartesian products of arbitrary sequences (that themselves may the products of sequences). This also lets you work (space efficiently) with sequences whose lengths cannot be known until they are exhausted. However, the lengths of sequences cannot be known until they are exhausted.
        2. You can observe the cyclical wrap-around point of a sequence. This lets the wrap-around be used as a trigger for other events, such as terminating a parallel merge operation or advancing the next most-significant sequence in a series of sequences.
        3. I think that it's easier to build up a chain of operations with sequences than with the collection of CPAN modules. For example, it seems like you must forego the space-saving benefits of iterators to take the product of an earlier product, which discourages the use of products in computational chains.
        4. Nevertheless, I think that the CPAN modules are probably more practical for the functionality where our libraries overlap. My sequences are first and foremost meditative material. In their present form, they aren't suited for production use.
        If you're looking to add features, it would be nice if Set::CrossProduct could take the product of arbitrary iterators.   :)

        Cheers,
        Tom

Re: A mini-language for sequences (part 1)
by trammell (Priest) on Mar 10, 2005 at 18:52 UTC
    My to-do list has included going through the examples in this meditation for some time; the attached code is a uuencoded .tgz file containing a more streamlined version of Sequences.pm, plus files containing the various bits of example code.

    Is there an easier way to post multi-file code than this? I'm not in the mood to cut-n-paste 20-odd files...

    begin-base64 644 amlfsp1.tgz H4sIAPKVMEIAA+1cbW/bRhL2Z/2KiaIk5IWWRIqSetbZVYHmiqJX3OEuBxSQ DYKS1jZrilT4Usdn6L939oXkkpQUX22LabIPYIvizs7uzu4z+y535V/Ga7N3 9IzoI8bDIf00x0NT/sxwZPYt0zZNazyyjvrmYGSOj2D4nJnKkMaJGwEcrcLA S8Jop9ynwrOCZJ9/Erii/v9DPqQkWJC4u149dRrUHiPb3lX/o7E1yuu/b6Gc OcKnIziIEb/y+n/5opfGUW/uBb01ifxWa+0ubtwrAnmDmLRaaUwgTiJvkUzY 860bBV5wFfNvcxf/vXn3cR1GCYneoHyYRjB998u//vnv93AKH257EJMPQIJ0 RSI3IfSbs47CpZU/0YeYfbuMwpUTr8miBQj2JoyIu7iWnwsp/tbzMWX2uHLX 7DMmkUe4xv9561wXPju3XsKV/eb6qZDhj068cH03Ein5LFfOIgwWbgI9LFec ziEgt3DP9K3uQOtg5pPQgA5K6ljWqTNhYXOfxDF7i2FMZtLatFovYYH5Rwtc psEi8cIgBs0NwuBuFaaYkXQehWniBSTWIbnGRCOyjkhMgoTmhFcHzwU1VzrX XusiL3ltHZ9hBrWpo7P0hKicYQ8zRHyyiml2tb4B50xYGIhG4MIUHQ/+BlMu nr+k+BY0oeX4bIZib9/CBeglkRNYhnBPVZxCfwKaDhueyoZmjLab793EPTn5 Pl1hwxPGLZoIz4QfYoVAR5Y8OXlPIox9CuZkt8yPwZJajSZdLrqoJ1rwjjPr X4ii3157PgENxabYFFCACh6faboumUME0SZ2j1VzqXUcHU3B02RfTlApbJjg JI+2jrwguYT2K2sJp2fwKj4P2gYzmgFtKtrODMNsjgnLdSeIItVgjJETubHR XMd5krVaZGGsSLRAkAasbUoxMh2JkEuoXBHE39MqZ3EM8WJrdZfTmpREirxI unhiQE3Hm0i5jfzDi5OTk/8mng9vIrJMF+TNpGwaUVAeiDnITaZ1XKMz12l1 OLJB48yW3FNoU6w4oyIj+Zf73HdQrRrTQPkFejmCcFJyPTHyXwZyVT2onWEc fEUF9L0to+4M5dTxey15KZZWLieagX7oPEo5Ge5d6yVj751aCuXWJ4paNAVT GKFiAHCDJbwolJYtwKs4SaNA0raRc8lYud/4u/Im5WJSJvu3cl2wJrolZdHV 5Il3WCM7ZY0L0wZ9Ioc81EyUTiX6aJnvYvplKhdqS5mXbOtdVoQ2ohJeMMnX r7l2WXnFFFvKTXumpFxsWupr7zLh0piqyPR9rfbLBt/RwLG3fqhZsZtOmcPf W9nYu2JZ4qJoBR9r5a+YnxZmP20zrNP4Ok/KqFRrUdoMxI/JFi25hTWpwnVA 9qINu12p0HUvW40LWzyxYFPVSRd5K54y4pWst6lUFB9WFbWFr9aMyOU2Iaiq ZdGoKzWA1ZQUh1LOoa677InKo7Q9LU9UVEriLX6P+waeJK8rLko7AZ6kXOhM TS0b+9KvZVYrCi4nP5s6FzTJWmfiLz/lzGgR3cUiXdW8l2jTIpRye0u7LY1r cj3c8uwrb7glft66Ac6VIvcO/aKIgwMe9oAjqVIJxKD5HuAl/P3HX35+d4ID WhKRNzEOh8K5O/fvwIXY9xY32LXcosrwEl0eTipQzou73e7+yl1gTZJalyo8 xDZyZv0z04jGoaJV8pY75zgfxuwmOcsGt1S8n0ebrF1oPJK+xVwP6MJ40bK2 RAP5N66A6cRhcevh879s/k8+9s3u2n+WOeYn5v/9gcXXfwajAZXE+b89pus/ av7//KjM/+HYb22Z4suTf3lhgLLJnS8IyTrmtts22nP8W7SxhbOZTybBHdQz v/o/Wr4ChcR/qyn+I/Ml/luc/0PF/0PgWfnfytdztFwOX+eMLYIVdRuCxP9B Y/wfSfw3bcZ/a6T4fwg8K/9pcBgQJ7kNneQ6KsTYDHZQ8RDyyh3TaVRjs3mS 8hNPCYn/dmPjf3zO+W8NOP9V/38QNMx/KnEXpo4bLJ2VpAbftQ1orwgbRuSr 77scg1HWwrcmpPGF8hq7IPF/2Bj/+3bB/0Gf81/t/x8Ej+M/2xB3oztn6V15 SSyvXtFTAUhLHlDbg8q3nUCbQZ+tZ9MIxyZc6PAxiyaWRaUxgpSYZhkDNSB4 LCT+j5qb/1sS/4dq/H9A1M7/PN3kP5/ml3ed+b4LW7prT53zoM33XBSLG4HE /3Fj/b89Lvhvm4z/AzX+PwgewX/K5HC5jJ107SShtB26cj96K7ahWN4S1cTO dRZenCASPoKd/chcBDueBa/Aov6hVRkHFOlqZp+NAnYEmmqIsBcS/79piv/D vtT/23z/b2Ar/h8Cfyb+sxOiv5GgnKLEdhHnLaZT1c3PnEiKj1FoU59gSPq3 eJZS6BfhWiT+/7Up/o8GI8F/ev2Dr/8PVf9/EDyS/ysvcHwSXCXXziJcoRaX n2qXFgIKEQOmXrBOy8sB9LQNe8tPsPLj1TOYXRigsceOg4/0kC5c5Aq2riTI avbQf8ZSQIXsvBc7frXXD03vmcPYwNkpSIVhzgOo99hhA41ptQ1RoHjtewn0 erQU9EYETo5gSeDy6ho8+PUGfFhBr6UfetGy4L/Zb27/r1j/t0di/U+d/zkI tvJ/912fp938K05P5+cDakv7cHBKfFWQ+N/c+T97mPHftAdjxn+7r/h/CDTJ /x2bf6x7xPewItCDzE/Qs+I7nURl/0/5jAdD4n9j5//s4Sjn/9gU63+q/z8I /jj/P3E9D+f3FPfFzbyOizPzzpxdyIMNk6Kb+ukiqUj9RZLKuM+ul2hw/jqm VxOYG+l2Bzp/7He7bK6+Zdu/Gl2kmKsYllTwZ6vb/Wa7tqZr6+kh8b+x83/D YvxvDYei/1f7fwfBrv3/B7uAyi1udg3HS4gpLv5/9937n376GVw3ublZ0Uv0 mYAlBH5ggCsGdsseJV6xi22x4G/tniufWs/gnCdliAcL59Z8YaB08U/efizd 0cmHFdlV4GyOzu4Dg26UpfniIc/ZfXvqtDdv30r3fHRxz4cOWPj2prgYDuei OJ/jcqHE/+bO/9n5+Z/R2LYU/w+Ip+b/7t9q2PMLDRW+zIr+nS/IiQuXF5Bf +vmEaHZF9OJzpNxnBYn/jZ3/s8dZ/2/bpmWp8f8B8cT8/2P0ZzsJLqb40Uki N4jXYX4lnq4RROFtXN5MnEk3rIur5OK3OZg8vSuvA7+UzPckqXrUMoNZ3zAv DJhZxoB+2MYQPy5qXqiaIS1T8iVs++WQ+N/Y+T97MMz5Pxrx8/9Ddf73IHg0 /w+xBIBiyzBxctGWNHrfOrMXv9CTHT30l2LlQOz3c6JLKnEgYRomdwumQR3D l8TxfZD439j5P7uf3f8fDgdi/0/1/4fBo/jPmJmu6VEccbSHHa3ZCPrzADpA p4+UkrpY0ZNH7HRr/vw1ldBrS3d1uVgIFj96UY70shIr/2UNKe6XvaKnoKCg oKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKDwNeF3lkgKxQB4AAA= ====
      First, thanks for taking the time to do this! The way you broke out the examples is great. It's a much better format for meditation than my original linear flow.

      Second, here's a patch that adds seq_concat, which is almost identical to seq_series. (In fact, the patch redefines the latter in terms of the former.)

      diff -urN amlfsp1/ex17.pl amlfsp1.new/ex17.pl --- amlfsp1/ex17.pl 2005-03-09 15:22:08.000000000 -0500 +++ amlfsp1.new/ex17.pl 2005-03-10 20:53:15.660873854 -0500 @@ -11,5 +11,5 @@ seq(1..3)->seq_map(\&supto)->seq_concat->enumerate; -#seq(1..3)->seq_concatmap(\&supto)->enumerate; +seq(1..3)->seq_concatmap(\&supto)->enumerate; diff -urN amlfsp1/Sequences.pm amlfsp1.new/Sequences.pm --- amlfsp1/Sequences.pm 2005-03-10 13:38:12.000000000 -0500 +++ amlfsp1.new/Sequences.pm 2005-03-10 20:56:21.166003398 -0500 @@ -101,7 +101,11 @@ } sub seq_series { - my $seqs = seq( @_ ); + seq( @_ )->seq_concat; +} + +sub seq_concat { + my $seqs = shift; my $seq; seqsub { my @val; @@ -166,19 +170,6 @@ wantarray ? @accum : $accum[0]; } -sub seq_concat { # FIXME: there's probably a slicker way of doing th +is... - my $seq = shift; - my @cache; - while (my @seqs = $seq->()) { - foreach my $s (@seqs) { - while (my @vals = $s->()) { - push @cache, @vals; - } - } - } - seq(@cache); -} - sub seq_concatmap { my ($seq, $fn) = @_; $seq->seq_map($fn)->seq_concat;

      To apply the patch (on a Unix-esque system), save it into a file, "cd" into the directory containing Sequences.pm, and then run this command: "patch -p1 < patchfile".

      Cheers,
      Tom

        Excellent! I struggled a bit with seq_concat and seq_concatmap; I'm glad to have more idiomatic versions.