 Perl: the Markov chain saw PerlMonks

### Recamįn's sequence and memory usage

by Athanasius (Archbishop)
 on Jul 13, 2015 at 08:41 UTC Need Help??

Esteemed Monks,

I was looking at The On-Line Encyclopedia of Integer Sequences (OEIS) (you know, as one does), and on the Welcome page I found a list of Some Famous Sequences, of which the first is Recamįn’s sequence, defined as follows:

```R(0)  = 0;
for n > 0,
R(n) = R(n-1) - n if positive and not already in the sequence, oth
+erwise
R(n) = R(n-1) + n.
[download]```

What makes this sequence interesting is N. J. A. Sloane’s conjecture that every number eventually appears.

Coding the sequence is simplicity itself; the challenge is to test Sloane’s conjecture by keeping track of the numbers that have not yet appeared in the series. My initial, naïve approach was to use a sieve, à la Eratosthenes:

```use strict;
use warnings;
use constant MAX => 1e6;

\$| = 1;

my %seq = (0 => 0);
my \$r0  =  0;

for my \$n (1 .. MAX)
{
my \$r = \$r0 - \$n;
\$r = \$r0 + \$n if \$r < 0 || exists \$seq{\$r};

\$seq{\$r} = \$n;
\$r0      = \$r;
}

for (0 .. MAX)
{
unless (exists \$seq{\$_})
{
printf "For MAX = %.1e, the first missing number is %d\n", MAX
+, \$_;
last;
}
}
[download]```

But this turned out to be far too memory-hungry: for values of MAX of the order of twenty million, RAM usage on my 3GB system approaches 100%, thrashing sets in, and the script (along with the rest of Windows) grinds to a shuddering halt.

Surely, I thought, there must be a memory-efficient way to represent a sieve? And of course there is, and of course it was already implemented on CPAN. A little searching led to the Set::IntSpan module which stores runs of consecutive integers as spans, allowing large (even infinite) collections of integers to be represented very economically.

Calculation of successive terms in the Recamįn sequence is noticeably slower using Set::IntSpan for lookup than it is using a hash. But, as the adage says, it’s better to be late than be dead on time. (This was the slogan of an Australian safe driving ad campaign some years ago.) For the record: I also looked at Set::IntSpan::Fast and Set::IntSpan::Fast::XS. The latter failed to install on my system, and the former actually ran slower than Set::IntSpan for this use-case.

Turns out that Set::IntSpan not only solves the memory problem, it also makes it possible to dispense with an upper bound for the sieve. How, then, to display progressive results? Well, the OEIS has a couple of additional series related to Recamįn’s:

• A064228: values of R(n) that take a record number of steps to appear: 1, 2, 4, 19, ...
• A064227: the values of n corresponding to the values in A064228: 1, 4, 131, 99734, ...

So I recast the script to output successive values of these two series:

```14:20 >perl recaman.pl
1 <-- 1
2 <-- 4
4 <-- 131
19 <-- 99734
...
[download]```

Here is the new script:

```use strict;
use warnings;
use sigtrap handler => \&int_handler,   'INT',
handler => \&break_handler, 'BREAK';
use Set::IntSpan;
use Time::HiRes qw(gettimeofday tv_interval);

\$|       = 1;
my \$t0      = [gettimeofday];
my \$min0    = 1;
my \$n       = 0;
my \$r0      = 0;
my \$missing = Set::IntSpan->new( '1-)' );

print "\$min0 <-- ";

while (++\$n)
{
my \$r = \$r0 - \$n;
\$r = \$r0 + \$n if \$r < 0 || !\$missing->member(\$r);

\$missing->remove(\$r);

if ((my \$min1 = \$missing->min) > \$min0)
{
print "\$n\n\$min1 <-- ";
\$min0 = \$min1;
}

\$r0 = \$r;
}

sub int_handler
{
printf "\nn = %d, elapsed time: %.1fs\n", \$n, tv_interval(\$t0);
}

sub break_handler
{
int_handler();
exit 0;
}
[download]```

This script was developed under Windows 8.1, 64-bit, using Strawberry Perl:

```14:20 >perl -v

This is perl 5, version 22, subversion 0 (v5.22.0) built for MSWin32-x
+64-multi-thread
[download]```

The two signal handlers allow the script to be interrupted as follows:

• Control-C causes the script to display the current value of \$n and the total running time of the script so far.
• Control-Break causes the script to display the same information and then exit.

My takeaways from this meditation?

First, we all know that micro-optimisation is pointless until you have first selected the best algorithm(s). But optimising an algorithm may actually consist in optimising its underlying data structures. Obvious? Yes, but still worth a reminder now and then.

Second, CPAN is awesome! But you knew that already. :-)

Cheers,

 Athanasius <°(((>< contra mundum Iustus alius egestas vitae, eros Piratica,

Replies are listed 'Best First'.
Re: Recamįn's sequence and memory usage
by hdb (Monsignor) on Jul 13, 2015 at 14:14 UTC

Athanasius, thanks for your interesting write-up. Here is a little version using vec for storing which numbers have been seen already. It runs quite happily until it hits integer overflow... It only prints a line when the maximum observed or the minimum unobserved increases.

```use strict;
use warnings;

my \$min = 2; # smallest unused number
my \$max = 3; # largest number already visited
my \$n   = 2;
my \$r   = 3;
my \$s;
vec(\$s,\$_,1)=1 for (0,1,3);
my \$flag = 1;
while(++\$n){
\$r += (\$r-\$n>0) && not( vec(\$s,\$r-\$n,1) ) ? -\$n : \$n;
vec(\$s,\$r,1)=1;
\$min++, \$flag=1 while vec(\$s,\$min,1);
\$max = \$r, \$flag=1 if \$r > \$max;
print "\$n\t\$max\t\$min\t\$r\t".length(\$s)."\n" if \$flag;
\$flag = 0;
}
[download]```

Hello hdb,

Brilliant use of vec, producing fast and very clever code. Bravo!

But — running your script on my system, I didn’t get anywhere near integer overflow before Windows effectively froze, with Task Manager showing memory at 97% and disk at 100%. (I didn’t time it, but it can’t have taken more than an hour or two to get to this point.) The console read:

```2162984298      13994685375     1355    13994685375     1749335672
[download]```

So then I ran my script to see if it would get that far. 14 hours and 37 minutes later (!) it had reached:

```n = 2177458553, elapsed time: 52649.9s
[download]```

with Task Manager showing memory at 60% and disk at 0%.

:-)

Cheers,

 Athanasius <°(((>< contra mundum Iustus alius egestas vitae, eros Piratica,

Re: Recamįn's sequence and memory usage
by QM (Parson) on Jul 13, 2015 at 15:58 UTC
Fun!

Just wondering how 1073 could be checked, that seems like an awfully big number?

-QM
--
Quantum Mechanics: The dreams stuff is made of

Indeed!

Date Terms Computed Smallest Missing Number Author
6 Nov, 2001 1 × 1015 852,655 Allan Wilks
13 Jun, 2006 1 × 1025 852,655 Benjamin Chaffin
28 Mar, 2008 7.78 × 1037 852,655 Benjamin Chaffin
22 Mar, 2010 4.28 × 1073 852,655 Benjamin Chaffin

I don’t know how it’s done, but it’s certainly a slow process! :-O

Cheers,

 Athanasius <°(((>< contra mundum Iustus alius egestas vitae, eros Piratica,

Re: Recamįn's sequence and memory usage
by (anonymized user) (Curate) on Jul 14, 2015 at 18:25 UTC
potential optimisations:

1) remove consecutive entries between a(0)=0 and m-1 where m is the first missing value. This prevents the sieve from growing beyond resources to support it.

2) use an array instead of a hash -- if the index is numeric, it's an array.

3) calculate the rate of growth of the remaining sieve to test for convergence.

One world, one people

From my experiments so far with n up to 2e10:

1. The first missing value is tiny compared to the maximum observed number, so not much can be saved here.
2. The observed numbers only cover 1/8 of the whole range, so we are dealing with a sparsely populated array in your proposal. This might not be the best way to store the sequence.
3. The ratio of the maximum number to the sequence number is around 8:1 so not much to be gained here either.

In that case it looks like the range of the missing values is divergent. So although the conjecture looks false, it also looks tricky to disprove!

One world, one people

Log In?
 Username: Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlmeditation [id://1134465]
Front-paged by Arunbear
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others browsing the Monastery: (2)
As of 2021-04-15 02:00 GMT
Sections?
Information?
Find Nodes?
Leftovers?
Voting Booth?

No recent polls found

Notices?