Re: Challenge: Simple algorithm for continuing series of integers

BTW, this test case is wrong:

    [[0, 1, 1, 2, 3],       8],
[download]

Here is my approach, which will at least identify the homogenous linear recurrences (basically everything here except the factorial example). The main work is done by candidate_recurrence. A degree-D linear recurrence is something like this:

a(n) = c(1)a(n-1) + ... + c(D)a(n-D)

where the c(i) values are constant coefficients. Any 2D-1 values uniquely define a degree-D linear recurrence, and that recurrence is what candidate_recurrence computes. It also takes another parameter which says whether to consider a recurrence with a constant additive term as well, like

a(n) = c(1)a(n-1) + ... + c(D)a(n-D) + c(0)

In either case, it's just some basic linear algebra, for which the code currently uses PDL to solve.

So given this candidate_recurrence, the sequence-identification algorithm is pretty simple. Just start with D very low and keep increasing it until you reach one that works with the entire sequence (low D = low complexity, so this is a kind of Occam's razor simplest (linear) explanation for the sequence).

This works for all of your examples except for the factorials (not a homogenous linear recurrence). For test cases (1) and (1,2), it gives reasonable answers right now (all-ones sequence, and powers of two respectively), where your test wants it to fail. And for test case (1,2,3,1), there are precision issues with PDL's linear algebra, which is unfortunate. Still, all we need is to take the inverse of an integer matrix. There are no inherent issues with such a computation in general, I just didn't want to do it by hand, and didn't realize that PDL had precision issues!

#!/usr/bin/perl

use PDL;
use List::Util;

sub candidate_recurrence {
    my ($d, $lin, @values) = @_;
    my $N = $d+$lin;
    my $samp = 2*$N -$lin; ## number of samples we'll need
    
    ## some base cases
    return if $N == 0;
    return Recurrence->new( 0, 1, $values[0] ) if $d == 0 and $lin == 
+1;
    return if @values < $samp; # need enough points
    
    if ($d == 1 and $lin == 0) {
      return $values[0] ? Recurrence->new( 1, 0, $values[1] / $values[
+0] ) : undef;
    }

    my @samples = @values[ 0 .. $samp-1 ];

    my @matrix;
    for (0 .. $N-1) {
        push @matrix, [ @samples[ $_ .. $_+$d-1 ] ];
    }
    if ($lin) {
      push @$_, 1 for @matrix;
    }

    my $M = pdl @matrix;
    my $v = pdl( [ @samples[ $d .. $samp-1 ] ] )->transpose;
    
    return if $M->det == 0;
    return Recurrence->new( $d, $lin, list( $M->inv() x $v ) );
}

sub check_recurrence {
    my $R = shift;
    for my $n ($R->d .. $#_-1) {
        return if $R->next( @_[ 0 .. $n ] ) != $_[$n+1];
    }
    return 1;
}

sub identify {
    for my $d (0 .. 2) {
        for my $lin (0 .. 1) {
            my $R = candidate_recurrence($d, $lin, @_);
            return $R if $R and check_recurrence($R, @_);
        }
    }
}


sub series {
    my $R = identify(@_) or return;
    return $R->next(@_);
}

############
## cute recurrence class to make things simpler

{
 package Recurrence;
 sub new {
    my ($pkg, $d, $lin, @coeff) = @_;
    my $const = $lin ? pop @coeff : 0;
    return bless { d=>$d, lin=>$lin, coeff=>\@coeff, const=>$const }, 
+$pkg;
 }
  
 sub next {
    my $self = shift;
    return if @_ < $self->{d};
    my @window =  @_[ -$self->{d} .. -1 ];
    
    $self->{const}
      + List::Util::sum map { $self->{coeff}[$_] * $window[$_] } 0 .. 
+$#window;    
 }
  
 sub print {
    my $self = shift;
    my $d = $self->d;
    my @terms = grep { !/^0\*/ }
                map { $self->{coeff}[$d-$_] . "*a(n-$_)" } 1 .. $self-
+>{d};
    push @terms, $self->{const} if $self->{const} || @terms == 0;
    my $str = "a(n) = " . join " + ", @terms;
    $str =~ s/\b 1\*//gx;
    return $str; }
  
 sub d { return $_[0]->{d}; }
}
[download]

As for extending this to work with more general sequences, it seems like it would be quite difficult to get something that works with too many more classes of sequences. Fortunately linear recurrences encompass many common use cases. When your "magic star" recognizes 1, 1, 2, 5, 14, 42, 132 as the first few Catalan numbers, and 2, 6, 20, 70, 252 as the first few central binomial coefficients, you'll be onto something special ;)

Update: Replacing your main loop with the following

for my $t (@tests) {
    my @list = @{$t->[0]};
    print "@list : ";
    my $R   = identify(@list);
    if ($R) {
        print "identified as ", $R->print, " : next = ", $R->next(@lis
+t), $/;
    } else {
        print "??\n";
    }
}
[download]

.. gives some more verbose output:

1 : identified as a(n) = 1 : next = 1
1 1 : identified as a(n) = 1 : next = 1
0 0 : identified as a(n) = 0 : next = 0
1 2 : identified as a(n) = 2*a(n-1) : next = 4
0 1 2 : identified as a(n) = a(n-1) + 1 : next = 3
1 0 -1 : identified as a(n) = a(n-1) + -1 : next = -2
1 2 3 : identified as a(n) = a(n-1) + 1 : next = 4
1 2 4 : identified as a(n) = 2*a(n-1) : next = 8
2 4 8 : identified as a(n) = 2*a(n-1) : next = 16
1 3 9 : identified as a(n) = 3*a(n-1) : next = 27
1 -1 1 -1 : identified as a(n) = -a(n-1) : next = 1
-1 1 -1 1 : identified as a(n) = -a(n-1) : next = -1
1 0 1 0 : identified as a(n) = -a(n-1) + 1 : next = 1
0 1 0 1 : identified as a(n) = -a(n-1) + 1 : next = 0
1 1 2 3 5 : identified as a(n) = a(n-1) + a(n-2) : next = 8
0 1 1 2 3 : identified as a(n) = a(n-1) + a(n-2) : next = 5
1 2 3 5 8 : identified as a(n) = a(n-1) + a(n-2) : next = 13
1 2 6 24 120 : ??
1 0 0 1 : ??
1 2 3 1 : identified as a(n) = 5*a(n-1) + -7*a(n-2) : next = -16
1 3 5 : identified as a(n) = a(n-1) + 2 : next = 7
2 4 6 : identified as a(n) = a(n-1) + 2 : next = 8
[download]

blokhead

Comment on Re: Challenge: Simple algorithm for continuing series of integers Select or Download Code

Replies are listed 'Best First'.
Re^2: Challenge: Simple algorithm for continuing series of integers (confidence) by tye (Sage) on Oct 20, 2008 at 04:54 UTC
Your reply is pretty similar to the reply I had in mind and started to write and then quickly abandoned. Thanks. :) Some minor differences: I planned to strictly limit the complexity of generator that would be considered, nothing beyond `$s[$n]= $a + $b$s[$n-1] + $c$s[$n-2];`. I find that the next order of generator is too opaque for the reader of the code and so isn't something I'd have automated (the expense also climbs). So I had no intention of using PDL. But I also wanted to request that guessing be made much safer than proposed. I don't find "1,1" compellingly unique nor "1,3,5". I'd want at least "1,1,1" and "1,3,5,7" to be required. Actually, I'd take the last two items off the end of the presented start of the sequence and use that to compute the simplest matching sequence generator from some simple class(es) of generators. Then I'd only use it if the next two numbers matched the guessed-at sequence. If the numbers don't match, then I'd reject the code, not try to guess again. For something this magical, it is good to have a couple of "check digits" so that it stays DWIM not DoSomethingSurprising. Making a single typo in one number should very much not just happily pick a sequence that I didn't intend to describe. And people are very prone to expect other people to see things as they themselves currently see things and so will type "1,2,4,7" and fully expect that the next item is "obvious" (and then will come back a few weeks later and have no idea what they had in mind when they wrote that). For this type of magic, it is incumbent upon Perl to require the author to be more explicit than they naturally will think that they need to be. So, if you have three numbers in your example, you have to predict the sequence from the first number alone. The simplest prediction is a constant list, so that is the only type of list that can be generalized from just three items. So fewer than 3 items or a sequence of 3 items not all the same would always refuse to be generalized. If you have 4 items, then you must predict based on only the first two. The simplest prediction would be a linear progression. So "1,2,3,4 ... " would work as would "1,3,5,7 ... ". If you have 5 items, then you can use the first three to predict. So we'd solve the following two simultaneous equations: `$s[1]= $a + $b$s[0] $s[2]= $a + $b$s[1]` [download] Or just: `$b= ($s[2]-$s[1]) / ($s[1]-$s[0]); $a= $s[1] - $b$s[0];` [download] (If the first two items are the same, then set $b to 0 which gives us a constant sequence again and generalizing fails unless the third item is also* the same.) So for `1,2,4,$x,$y ... ` Perl guesses $b= 2 so $a= 0 and fails unless $x is 8 and $y is 16. When we get to `1,4,9,16,$x,$y ... `, the human thinks "$n*2" and Perl 6 computes $a= 2, $b= 2, $c= -1, that is `$s[$n]= 2 + 2$s[$n-1] - $s[$n-2];` since (9) == 2 + 2(4) - (1) and (16) == 2 + 2(9) - (4). Did Perl DWIM? I leave that as an exercise for the reader. Yes, $n*2 == 2 + 2($n-1)2 - ($n-2)2, actually. - tye	[reply] [d/l] [select]
Re^3: Challenge: Simple algorithm for continuing series of integers (confidence) by blokhead (Monsignor) on Oct 20, 2008 at 14:10 UTC
I agree, the magic-star should generally only identify a sequence that is nicely overspecified. But in the interest of Huffman coding the language I think that if anything warrants an exception, it should be arithmetic sequences (including constant sequences), since these would be the most common use cases (including 1,2,3,4...). Many times I've wanted to iterate over the multiples of N. I think "1,1,.." or "2,4,6,.." is just fine, and overspecifying the sequences by one more seems overkill because the rule is so simple/common. blokhead	[reply]
Re^4: Challenge: Simple algorithm for continuing series of integers (confidence) by tye (Sage) on Oct 20, 2008 at 15:53 UTC
If I meant to write 1,2,4 but typo'd the "4" and wrote 1,2,3, then I would like to know about the typo. Having 1,2,4...* not work will help with that, but my experience is that people will expect 1,2,4...* to work because it will seem obvious while they are writing it (and I just typed "1,2,3" and had to delete the 3 and type 4). I figured there would be an exception for simple counting lists so that just "2 ... " would be sufficient for "2,3,4,5 ... ". So 1,1...* vs 1.1...* would be another ugly typo that I would like detected. 100,101...* vs 100_101...*? So I respectfully disagree. But I figured it would be an up-hill fight against the common Perl mindset of making things rediculously terse. - tye	[reply]


laziness, impatience, and hubris
	PerlMonks