Re^2: Optimizing with Caching vs. Parallelizing (MCE::Map)

in reply to Re: Optimizing with Caching vs. Parallelizing (MCE::Map)
in thread Optimizing with Caching vs. Parallelizing (MCE::Map)

Hi Laurent_R and fellow monks,

Nice speedup! I applied 3 optimizations in preparation for a follow-up post combining caching with parallelization. This was done because File::Map adds overhead (i.e. it will not be as fast as an array). So, I asked myself can anything be done to further improve the serial implementation. And if yes, is it helpful. I provide the run time for each stage at the end of this post. It turns out that improvements were possible and will be converting collatz3_d.pl to use File::Map for the parallel demonstration.

Update: Further improvements, Step 4.

1. Replaced division by 2.

$n >> 1;
[download]

2. Removed one level of branching.

    while ($n != 1) {
        $result += $cache[$n], last
            if defined $cache[$n];

        my $new_n = $n % 2 ? 3 * $n + 1 : $n >> 1;
        $result++;
        $cache[$n] = $cache[$new_n] + 1
            if defined $cache[$new_n] and $n < $max;

        $n = $new_n;
    }
[download]

3. Then reduced the number of loop iterations. Credit for reducing the # of loop iterations was from watching Notation and compressed dynamics, one minute into it (i.e. the T(x) notation).

    while ($n != 1) {
        $result += $cache[$n], last
            if defined $cache[$n];

        $n % 2 ? ( $result += 2, $new_n = (3 * $n + 1) >> 1 )
               : ( $result += 1, $new_n = $n >> 1 );

        $cache[$n] = $cache[$new_n] + ($n % 2 ? 2 : 1)
            if defined $cache[$new_n] and $n < $max;

        $n = $new_n;
    }
[download]

4. Finally, less caching.

    while ($n != 1) {
        $result += $cache[$n], last
            if defined $cache[$n];

        $n % 2 ? ( $result += 2, $new_n = (3 * $n + 1) >> 1 )
               : ( $result += 1, $new_n = $n >> 1 );

        $n = $new_n;
    }
[download]

The final code optionally takes an argument.

#!/usr/bin/perl
use strict;
use warnings;
use feature qw/say/;

my $max = shift || 1e6;
$max = 1e6 if $max < 1e6;

my @cache = (0, 1, 2);

sub collatz_seq {
    my $input = shift;
    my $n = $input;
    my $result = 0;
    my $new_n;

    while ($n != 1) {
        $result += $cache[$n], last
            if defined $cache[$n];

        $n % 2 ? ( $result += 2, $new_n = (3 * $n + 1) >> 1 )
               : ( $result += 1, $new_n = $n >> 1 );

        $n = $new_n;
    }

    $cache[$input] = $result if $input < $max;
    return $result;
}

my @long_seqs;
for my $num (1..$max) {
    my $seq_length = collatz_seq $num;
    push @long_seqs, [ $num, $seq_length ] if $seq_length > 400;
}

@long_seqs = sort { $b->[1] <=> $a->[1]} @long_seqs;
say "$_->[0]: $_->[1]" for @long_seqs[0..19];
[download]

Results:

$ time perl collatz3_a.pl 1e7

# Intel i7 laptop, Docker Container, Ubuntu + Perlbrew Perl 5.30.1

collatz3_a.pl 1e7   32.291s  (a) original code, accepts argument
collatz3_b.pl 1e7   30.134s  (b) a + replaced division with >> 1
collatz3_c.pl 1e7   28.503s  (c) b + removed 1 level of branching
collatz3_d.pl 1e7   21.464s  (d) c + reduced loop iterations
collatz3_e.pl 1e7   19.357s  (e) d + less caching

# AMD 3970x, Docker Container, Ubuntu + Perlbrew Perl 5.30.1

collatz3_a.pl 1e7   13.130s  (a) original code, accepts argument
collatz3_b.pl 1e7   12.394s  (b) a + replaced division with >> 1
collatz3_c.pl 1e7   12.261s  (c) b + removed 1 level of branching
collatz3_d.pl 1e7    9.170s  (d) c + reduced loop iterations
collatz3_e.pl 1e7    7.681s  (e) d + less caching

8400511: 686
8865705: 668
6649279: 665
9973919: 663
6674175: 621
7332399: 616
7532665: 616
5649499: 613
8474249: 611
6355687: 608
8847225: 606
9533531: 606
6635419: 603
9953129: 601
7464846: 598
7464847: 598
3732423: 597
5598635: 595
8397953: 593
6298465: 590
[download]

Regards, Mario

In Section Meditations