http://qs321.pair.com?node_id=162711

In All roads lead to Rome, I pointed out that different languages lead to different ways to looking at problems. One thing I didn't mention was that different types of languages can lead to different solutions. Ben Tilly's phenomenal post Why I like functional programming was a classic example of how a different type of programming background can lead to a radically different solution. In that node, he attached actions to different sections of text. That is certainly not the way I looked at the problem. Interestingly, I would no longer consider such a horrible solution because continued exposure to different ways of looking at problems leads me to learn better ways of dealing with them. Thus, this post...

Getting back to Rome, the example that I put forward was using nested loops to find all elements in one array that existed in another array. The following code snippets all assume that the two arrays being compared have unique elements.

#!/usr/bin/perl -w use strict; my @alphas = qw/ 9 4 3 2 22 13 7 140 95 278/; my @betas = qw/ 8 3 4 1 278 94 15 7 19 200/; my @results; foreach my $alpha ( @alphas ) { foreach my $beta ( @betas ) { if ( $alpha == $beta ) { push @results, $beta; } } } @results = sort { $a <=> $b } @results; print "Linear search: @results\n";

For small arrays, nested loops are fine, but they don't scale well. Two 10 element arrays require 100 iterations. Two 100 element arrays leads to 10,000 iterations. So, how do you solve the scalability issue? More directly, I am wondering how different types of languages would solve the problem of identifying elements in one array that exist in another. A natural solution in Perl might be to use a hash.

my @alphas = qw/ 9 4 3 2 22 13 7 140 95 278/; my @betas = qw/ 8 3 4 1 278 94 15 7 19 200/; my @results; my %alpha; @alpha{ @alphas } = undef; foreach my $beta ( @betas ) { if ( exists $alpha{ $beta } ) { push @results, $beta; } } @results = sort { $a <=> $b } @results; print "Hash lookup: @results\n";

A C programmer, however, might sort one list and perform a binary search. Here's how such a thing might appear if translated to Perl.

my @alphas = qw/ 9 4 3 2 22 13 7 140 95 278/; my @betas = qw/ 8 3 4 1 278 94 15 7 19 200/; my @results; my @sorted = sort { $a <=> $b } @alphas; foreach my $beta ( @betas ) { my $index = binary_search( \@sorted, $beta ); if ( defined $index ) { push @results, $beta; } } @results = sort { $a <=> $b } @results; print "Binary search: @results\n"; sub binary_search { my ( $array, $target ) = @_; my ( $low, $high ) = ( 0, scalar @$array ); while ( $low < $high ) { use integer; my $current = ( $low + $high ) / 2; if ( $array->[$current] < $target ) { $low = $current + 1; } else { $high = $current; } } if ( $low < @$array and $array->[$low] == $target ) { return $low; } else { return undef; } }

Any monks familiar with other languages and styles of programming care to offer other examples? I'd be interested in seeing those in their native language and, if possible, how you would translate those solutions to Perl.

Cheers,
Ovid

Join the Perlmonks Setiathome Group or just click on the the link and check out our stats.

Replies are listed 'Best First'.
Re: TIMTOWTDI and other languages
by broquaint (Abbot) on Apr 28, 2002 at 22:45 UTC
    Here's a ruby example
    alphas = [9, 4, 3, 2, 22, 13, 7, 140, 95, 278] betas = [8, 3, 4, 1, 278, 94, 15, 7, 19, 200] result = (alphas & betas).sort
    There's a quite a few other ways to do it too, but I liked that one the most.

    Update: Noticing a somewhat lack of variety in the languages used here's a python example too

    alphas = [9, 4, 3, 2, 22, 13, 7, 140, 95, 278] betas = [8, 3, 4, 1, 278, 94, 15, 7, 19, 200] result = [] for x in alphas: if x in betas: result.append(x) result.sort()
    This is probably the more obvious path, but if you like your tools functional ...
    result = filter(lambda y: y != 0, [x in betas and x for x in alphas]) result.sort()
    The closest I can get in perl is this (map == list comprehension?)
    @result = sort { $a <=> $b } map { my $x = $_; grep $x == $_, @betas } @alphas;

    HTH

    _________
    broquaint

    update: added python examples
    update2: added perl version of last python example

Re: TIMTOWTDI and other languages
by jynx (Priest) on Apr 28, 2002 at 21:42 UTC
    Warning: untested code ahead!
    Quantum::Superpositions :)
    use Quantum::Superpositions; my @alphas = qw/ 9 4 3 2 22 13 7 140 95 278/; my @betas = qw/ 8 3 4 1 278 94 15 7 19 200/; my @results; for my $element (@alphas) { if ($element == any(@betas)) { push @results, $element; } } print "Results: ", (sort{ $a <=> $b }@results), "\n";
    jynx
      ...or, even easier:
      use Quantum::Superpositions; my @alphas = qw/ 9 4 3 2 22 13 7 140 95 278/; my @betas = qw/ 8 3 4 1 278 94 15 7 19 200/; my @results = eigenstates( any(@alphas) == any(@betas) );
Re: TIMTOWTDI and other languages
by samtregar (Abbot) on Apr 28, 2002 at 21:52 UTC
    If I were doing it in C I think I would probably sort both arrays and then walk them in tandem knowing that since they were sorted I would never have to look back into either. Reaching the end of either array would mean completion since no more matches could exist. I'm too lazy to do up the example but I bet it wouldn't be more than 5 or 6 lines of code.

    -sam

Re: TIMTOWTDI and other languages
by ariels (Curate) on Apr 29, 2002 at 12:09 UTC

    Doing it in C and with moderate performance requirements (e.g. ~100_000 elements in each list), I'd go samtregar's way and compare both lists sorted (this is probably faster than sorting the one and binary-searching the other through it when both lists have lengths of the same order).

    Doing it in C, but in a high performance situation, I'd try to change the data representation. If we can recast the problem such that the lists can be stored as bit strings, nothing can beat an AND. On a modern processor I can compare 64 elements at a time; unroll the loop 4--8 times and you'll be running about as fast as is possible.

    If elements of the lists are integers in a known range, this method is obviously suitable. In other cases, it might be possible to use a hash table (probably called a "symbol table") to translate list elements into indices, and then represent each list as a bitmap.

    Note that this approach might or might not be suitable, depending on the precise character of the code. Unfortunately, high-performance solutions often suffer from this problem.

    samtregar's solution is also useful in another advanced programming environment. In a shell script, to get common elements of files <samp>AAA</samp> and <samp>BBB</samp> efficiently and in sorted order, try

    sort -u AAA > AAA.sorted sort -u BBB > BBB.sorted comm -12 AAA.sorted BBB.sorted > common
    This technique works amazingly well, in fact.

Re: TIMTOWTDI and other languages
by mdillon (Priest) on Apr 28, 2002 at 22:12 UTC
    How about Perl 6 (I think): @intersection = grep $_ =~ @betas, @alphas;
      Looks like a double loop to me. The grep loop and the =~ loop unless perl will replace =~ LIST with hash lookups.
        It is a double loop; what's your point? This isn't "come up with the most efficient way to do X", or "do X without using a nested loop", it is (at least so far as I am interested) "show me idiomatic ways to do X". To quote Ovid, "I am wondering how different types of languages would solve the problem of identifying elements in one array that exist in another". I suppose Perl 6 isn't a different type of language than (recent) previous versions of Perl, but I thought the DWIM aspect of the Perl 6 code made the example worth posting.
      Actually, I was thinking about this some and was wondering whether the following would be valid code in Perl 6: @intersection = @alphas =~ @betas; Does anyone know if this will actually work? I looked through the Apocalypses and Exegeses, but I couldn't find any hints of how Perl 6's "smart match" (=~) operator will behave in list context. Unfortunately, searching for "=~" isn't something that search engines make straightforward, so I didn't know how to go about looking elsewhere.

      With this more neutral syntax, given the right properties on the arrays involved (e.g. predeclared as "int", and with an is presorted or some such), it may even be possible at compile time to translate some uses of this idiom into efficient, optimized searches instead of the naive, linear default.

        @intersection = @alphas =~ @betas;
        Does anyone know if this will actually work?

        I seriously doubt it. Larry has never even hinted that =~ will operate like that in a list context.

        Of course, if superpositions do get in then this:

        @intersection = eigenstates(any @alphas =~ any @betas);

        would work.

        And it's possible that the eigenstates operator is superfluous; that a superposition in a list context just yields its eigenstates automatically. In which case you only need this:

        @intersection = any @alphas =~ any @betas;
Re: TIMTOWTDI and other languages
by hding (Chaplain) on Apr 29, 2002 at 14:12 UTC

    In either Smalltalk or Common Lisp I'd probably just use the built in intersection facilities. (Respectively):

    a intersecton: b. (intersection a b)

    Of course the efficiency would depend on how these things are implemented. For the Smalltalk I use (Dolphin) it seems that it just iterates over the elements of a and b. One could coerce one of the collections to a Set first, hoping that the inclusion test would be better (which seems to be the case in Dolphin, which implements Set via a hash.

    c := b asSet. a select: [:element | b includes: element].

    Similarly for Lisp, if the built-in didn't seem to do the job fast enough, I'd be tempted either to use a hash or sort the lists and walk them (as others have suggested). It may well be that the built-in already uses one of these techniques though, so it's hard to say whether to expect an improvement

Re: TIMTOWTDI and other languages
by HollyKing (Pilgrim) on Apr 29, 2002 at 22:04 UTC

    Since I spend most of my day thinking in C++ I thought I would inject a little of that perspective. So here is a quick version that uses standard C++ only.

    #include <iostream> #include <set> #include <algorithm> #include <iterator> int the_alphas[] = { 9, 4, 3, 2, 22, 13, 7, 140, 95, 278 }; int the_betas[] = { 8, 3, 4, 1, 278, 94, 15, 7, 19, 200 }; const int the_alphas_size = sizeof(the_alphas) / sizeof(the_alphas[0]) +; const int the_betas_size = sizeof(the_betas) / sizeof(the_betas[0]); int main(int argc, char** argv) { std::set<int> alphas(the_alphas, the_alphas + the_alphas_size); std::set<int> betas(the_betas, the_betas + the_betas_size); std::set<int> results; std::set_intersection(alphas.begin(), alphas.end(), betas.begin(), betas.end(), std::inserter(results, results.begin())); std::cout << "set_intersection: "; std::copy(results.begin(), results.end(), std::ostream_iterator<int>(std::cout, " ")); std::cout << std::endl; return 0; }

    Owl looked at him, and wondered whether to push him off the tree; but, feeling that he could always do it afterwards, he tried once more to find out what they were talking about.