Recamán's sequence and memory usage

Esteemed Monks,

I was looking at The On-Line Encyclopedia of Integer Sequences (OEIS) (you know, as one does), and on the Welcome page I found a list of Some Famous Sequences, of which the first is Recamán’s sequence, defined as follows:

R(0)  = 0;
for n > 0,
    R(n) = R(n-1) - n if positive and not already in the sequence, oth
+erwise
    R(n) = R(n-1) + n.
[download]

What makes this sequence interesting is N. J. A. Sloane’s conjecture that every number eventually appears.

Coding the sequence is simplicity itself; the challenge is to test Sloane’s conjecture by keeping track of the numbers that have not yet appeared in the series. My initial, naïve approach was to use a sieve, à la Eratosthenes:

use strict;
use warnings;
use constant MAX => 1e6;

$| = 1;

my %seq = (0 => 0);
my $r0  =  0;

for my $n (1 .. MAX)
{
    my $r = $r0 - $n;
       $r = $r0 + $n if $r < 0 || exists $seq{$r};

    $seq{$r} = $n;
    $r0      = $r;
}

for (0 .. MAX)
{
    unless (exists $seq{$_})
    {
        printf "For MAX = %.1e, the first missing number is %d\n", MAX
+, $_;
        last;
    }
}
[download]

But this turned out to be far too memory-hungry: for values of MAX of the order of twenty million, RAM usage on my 3GB system approaches 100%, thrashing sets in, and the script (along with the rest of Windows) grinds to a shuddering halt.

Surely, I thought, there must be a memory-efficient way to represent a sieve? And of course there is, and of course it was already implemented on CPAN. A little searching led to the Set::IntSpan module which stores runs of consecutive integers as spans, allowing large (even infinite) collections of integers to be represented very economically.

Calculation of successive terms in the Recamán sequence is noticeably slower using Set::IntSpan for lookup than it is using a hash. But, as the adage says, it’s better to be late than be dead on time. (This was the slogan of an Australian safe driving ad campaign some years ago.) For the record: I also looked at Set::IntSpan::Fast and Set::IntSpan::Fast::XS. The latter failed to install on my system, and the former actually ran slower than Set::IntSpan for this use-case.

Turns out that Set::IntSpan not only solves the memory problem, it also makes it possible to dispense with an upper bound for the sieve. How, then, to display progressive results? Well, the OEIS has a couple of additional series related to Recamán’s:

A064228: values of R(n) that take a record number of steps to appear: 1, 2, 4, 19, ...
A064227: the values of n corresponding to the values in A064228: 1, 4, 131, 99734, ...

So I recast the script to output successive values of these two series:

14:20 >perl recaman.pl
1 <-- 1
2 <-- 4
4 <-- 131
19 <-- 99734
...
[download]

Here is the new script:

use strict;
use warnings;
use sigtrap handler => \&int_handler,   'INT',
            handler => \&break_handler, 'BREAK';
use Set::IntSpan;
use Time::HiRes qw(gettimeofday tv_interval);

   $|       = 1;
my $t0      = [gettimeofday];
my $min0    = 1;
my $n       = 0;
my $r0      = 0;
my $missing = Set::IntSpan->new( '1-)' );

print "$min0 <-- ";

while (++$n)
{
    my $r = $r0 - $n;
       $r = $r0 + $n if $r < 0 || !$missing->member($r);

    $missing->remove($r);

    if ((my $min1 = $missing->min) > $min0)
    {
        print "$n\n$min1 <-- ";
        $min0 = $min1;
    }

    $r0 = $r;
}

sub int_handler
{
    printf "\nn = %d, elapsed time: %.1fs\n", $n, tv_interval($t0);
}

sub break_handler
{
    int_handler();
    exit 0;
}
[download]

This script was developed under Windows 8.1, 64-bit, using Strawberry Perl:

14:20 >perl -v

This is perl 5, version 22, subversion 0 (v5.22.0) built for MSWin32-x
+64-multi-thread
[download]

The two signal handlers allow the script to be interrupted as follows:

Control-C causes the script to display the current value of $n and the total running time of the script so far.
Control-Break causes the script to display the same information and then exit.

My takeaways from this meditation?

First, we all know that micro-optimisation is pointless until you have first selected the best algorithm(s). But optimising an algorithm may actually consist in optimising its underlying data structures. Obvious? Yes, but still worth a reminder now and then.

Second, is awesome! But you knew that already. :-)

Cheers,

Athanasius <°(((>< contra mundum Iustus alius egestas vitae, eros Piratica,

Back to Meditations