The stupid question is the question not asked PerlMonks

### Re: Optimizing with Caching vs. Parallelizing (MCE::Map)

by Laurent_R (Canon)
 on Apr 14, 2020 at 13:53 UTC Need Help??

Hi dear fellow monks,

as a follow-up to my previous post, I implemented my new caching strategy, consisting in storing in the cache the length of the sequences rather than the full sequences.

On the computer where I'm running my tests right now, my original program has the following timing:

```\$ time perl collatz.pl
837799: 525
626331: 509
[lines omitted for brevity]
906175: 445
922524: 445
922525: 445

real    1m37,551s
user    1m9,375s
sys     0m21,031s
My laptop is obviously much slower than Nick's computer (where this program took 22 seconds).

This is now my first implementation with the new caching strategy:

```#!/usr/bin/perl
use strict;
use warnings;
use feature qw/say/;
use constant MAX => 1000000;

my %cache = (2 => 2);

sub collatz_seq {
my \$input = shift;
my \$n = \$input;
my \$result = 0;
while (\$n != 1) {
if (exists \$cache{\$n}) {
\$result += \$cache{\$n};
last;
} else {
my \$new_n = \$n % 2 ? 3 * \$n + 1 : \$n / 2;
\$result++;
\$cache{\$n} = \$cache{\$new_n} + 1
if defined \$cache{\$new_n} and \$n < MAX;
\$n = \$new_n;
}
}
\$cache{\$input} = \$result if \$input < MAX;
return \$result;
}

my @long_seqs;
for my \$num (1..1000000) {
my \$seq_length = collatz_seq \$num;
push @long_seqs, [ \$num, \$seq_length ] if \$seq_length > 400;
}

@long_seqs = sort { \$b->[1] <=> \$a->[1]} @long_seqs;
say  "\$_->[0]: \$_->[1]" for @long_seqs[0..19];
This program produces the same outcome, but is nearly 3 times faster:
```real    0m34,207s
user    0m34,108s
sys     0m0,124s
But we now end up with a cache having essentially one entry per input number in the 1..1_000_000 range. So, I thought, perhaps it might be better to use an array, rather than a hash, for the cache (accessing an array item should be faster than a hash lookup).

This is the code for this new implementation:

```#!/usr/bin/perl
use strict;
use warnings;
use feature qw/say/;
use constant MAX => 1000000;

my @cache = (0, 1, 2);

sub collatz_seq {
my \$input = shift;
my \$n = \$input;
my \$result = 0;
while (\$n != 1) {
if (defined \$cache[\$n]) {
\$result += \$cache[\$n];
last;
} else {
my \$new_n = \$n % 2 ? 3 * \$n + 1 : \$n / 2;
\$result++;
\$cache[\$n] = \$cache[\$new_n] + 1
if defined \$cache[\$new_n] and \$n < MAX;
\$n = \$new_n;
}
}
\$cache[\$input] = \$result if \$input < MAX;
return \$result;
}

my @long_seqs;
for my \$num (1..1000000) {
my \$seq_length = collatz_seq \$num;
push @long_seqs, [ \$num, \$seq_length ] if \$seq_length > 400;
}

@long_seqs = sort { \$b->[1] <=> \$a->[1]} @long_seqs;
say  "\$_->[0]: \$_->[1]" for @long_seqs[0..19];
With this new implementation, we still obtain the same result, but the program is now more than 55 times faster than my original one (and almost 20 times faster than the solution using a hash for the cache):
```\$ time perl collatz3.pl
837799: 525
626331: 509
[Lines omitted for brevity]
922524: 445
922525: 445

real    0m1,755s
user    0m1,687s
sys     0m0,061s
I strongly suspected that using an array would be faster, but I frankly did not expect such a huge gain until I tested it.

So, it is true that throwing more CPU cores at the problem makes the solution faster (although to a limited extent with my computer that has only 4 cores). But using a better algorithm can often be a better solution.

Replies are listed 'Best First'.
Re^2: Optimizing with Caching vs. Parallelizing (MCE::Map)
by marioroy (Parson) on Apr 14, 2020 at 23:21 UTC

Hi Laurent_R and fellow monks,

Nice speedup! I applied 3 optimizations in preparation for a follow-up post combining caching with parallelization. This was done because File::Map adds overhead (i.e. it will not be as fast as an array). So, I asked myself can anything be done to further improve the serial implementation. And if yes, is it helpful. I provide the run time for each stage at the end of this post. It turns out that improvements were possible and will be converting collatz3_d.pl to use File::Map for the parallel demonstration.

Update: Further improvements, Step 4.

1. Replaced division by 2.

```\$n >> 1;

2. Removed one level of branching.

```    while (\$n != 1) {
\$result += \$cache[\$n], last
if defined \$cache[\$n];

my \$new_n = \$n % 2 ? 3 * \$n + 1 : \$n >> 1;
\$result++;
\$cache[\$n] = \$cache[\$new_n] + 1
if defined \$cache[\$new_n] and \$n < \$max;

\$n = \$new_n;
}

3. Then reduced the number of loop iterations. Credit for reducing the # of loop iterations was from watching Notation and compressed dynamics, one minute into it (i.e. the T(x) notation).

```    while (\$n != 1) {
\$result += \$cache[\$n], last
if defined \$cache[\$n];

\$n % 2 ? ( \$result += 2, \$new_n = (3 * \$n + 1) >> 1 )
: ( \$result += 1, \$new_n = \$n >> 1 );

\$cache[\$n] = \$cache[\$new_n] + (\$n % 2 ? 2 : 1)
if defined \$cache[\$new_n] and \$n < \$max;

\$n = \$new_n;
}

4. Finally, less caching.

```    while (\$n != 1) {
\$result += \$cache[\$n], last
if defined \$cache[\$n];

\$n % 2 ? ( \$result += 2, \$new_n = (3 * \$n + 1) >> 1 )
: ( \$result += 1, \$new_n = \$n >> 1 );

\$n = \$new_n;
}

The final code optionally takes an argument.

```#!/usr/bin/perl
use strict;
use warnings;
use feature qw/say/;

my \$max = shift || 1e6;
\$max = 1e6 if \$max < 1e6;

my @cache = (0, 1, 2);

sub collatz_seq {
my \$input = shift;
my \$n = \$input;
my \$result = 0;
my \$new_n;

while (\$n != 1) {
\$result += \$cache[\$n], last
if defined \$cache[\$n];

\$n % 2 ? ( \$result += 2, \$new_n = (3 * \$n + 1) >> 1 )
: ( \$result += 1, \$new_n = \$n >> 1 );

\$n = \$new_n;
}

\$cache[\$input] = \$result if \$input < \$max;
return \$result;
}

my @long_seqs;
for my \$num (1..\$max) {
my \$seq_length = collatz_seq \$num;
push @long_seqs, [ \$num, \$seq_length ] if \$seq_length > 400;
}

@long_seqs = sort { \$b->[1] <=> \$a->[1]} @long_seqs;
say "\$_->[0]: \$_->[1]" for @long_seqs[0..19];

Results:

```\$ time perl collatz3_a.pl 1e7

# Intel i7 laptop, Docker Container, Ubuntu + Perlbrew Perl 5.30.1

collatz3_a.pl 1e7   32.291s  (a) original code, accepts argument
collatz3_b.pl 1e7   30.134s  (b) a + replaced division with >> 1
collatz3_c.pl 1e7   28.503s  (c) b + removed 1 level of branching
collatz3_d.pl 1e7   21.464s  (d) c + reduced loop iterations
collatz3_e.pl 1e7   19.357s  (e) d + less caching

# AMD 3970x, Docker Container, Ubuntu + Perlbrew Perl 5.30.1

collatz3_a.pl 1e7   13.130s  (a) original code, accepts argument
collatz3_b.pl 1e7   12.394s  (b) a + replaced division with >> 1
collatz3_c.pl 1e7   12.261s  (c) b + removed 1 level of branching
collatz3_d.pl 1e7    9.170s  (d) c + reduced loop iterations
collatz3_e.pl 1e7    7.681s  (e) d + less caching

8400511: 686
8865705: 668
6649279: 665
9973919: 663
6674175: 621
7332399: 616
7532665: 616
5649499: 613
8474249: 611
6355687: 608
8847225: 606
9533531: 606
6635419: 603
9953129: 601
7464846: 598
7464847: 598
3732423: 597
5598635: 595
8397953: 593
6298465: 590

Regards, Mario

Hi Mario,

I did not expect such micro-optimizations to bring such a significant performance improvement, that's interesting. Especially replacing the division by 2 by a bit shift is the type of thing that I had stopped doing decades ago, when I figured that the C compiler I was using at the time was doing this type of optimization at least as well and often better than I was able to do. Good to be reminded that the optimizations you suggested can also be quite useful. Thank you.

Thank you very much for your challenging ideas.

Hi Laurent,

Thank you, for your caching algorithm. It has kept me busy after hours. I added Step 4 for less caching. Basically, commenting 2 lines.

I updated my posts here and here.

```sub collatz_seq {
my \$input = shift;
my \$n = \$input;
my \$result = 0;
my \$new_n;

while (\$n != 1) {
\$result += \$cache[\$n], last
if defined \$cache[\$n];

\$n % 2 ? ( \$result += 2, \$new_n = (3 * \$n + 1) >> 1 )
: ( \$result += 1, \$new_n = \$n >> 1 );

# \$cache[\$n] = \$cache[\$new_n] + (\$n % 2 ? 2 : 1)
#     if defined \$cache[\$new_n] and \$n < \$max;

\$n = \$new_n;
}

\$cache[\$input] = \$result if \$input < \$max;
return \$result;
}

A new member collatz3_e was added to the list.

```# Intel i7 laptop, Docker Container, Ubuntu + Perlbrew Perl 5.30.1

collatz3_a.pl 1e7  32.291s  (a) original, accepts argument
collatz3_b.pl 1e7  30.134s  (b) a + replaced division with >> 1
collatz3_c.pl 1e7  28.503s  (c) b + removed 1 level of branching
collatz3_d.pl 1e7  21.464s  (d) c + reduced loop iterations
collatz3_e.pl 1e7  19.357s  (e) d + caching less

# AMD 3970x, Docker Container, Ubuntu + Perlbrew Perl 5.30.1

collatz3_a.pl 1e7  13.130s  (a) original, accepts argument
collatz3_b.pl 1e7  12.394s  (b) a + replaced division with >> 1
collatz3_c.pl 1e7  12.261s  (c) b + removed 1 level of branching
collatz3_d.pl 1e7   9.170s  (d) c + reduced loop iterations
collatz3_e.pl 1e7   7.681s  (e) d + caching less

The 32-core machine reaches below 0.5 seconds for size 1e7. That includes the time to launch Perl, load modules, spin up and reap workers.

The Collatz Conjenture took over me. And finally am able to move on.

Kind regards, Mario

Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11115520]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others lurking in the Monastery: (4)
As of 2022-12-02 13:02 GMT
Sections?
Information?
Find Nodes?
Leftovers?
Voting Booth?
My favourite new Perl feature (in 2022) ...

Results (43 votes). Check out past polls.

Notices?