Re^2: Optimizing with Caching vs. Parallelizing (MCE::Map)

Hi Laurent_R and fellow monks,

Nice speedup! I applied 3 optimizations in preparation for a follow-up post combining caching with parallelization. This was done because File::Map adds overhead (i.e. it will not be as fast as an array). So, I asked myself can anything be done to further improve the serial implementation. And if yes, is it helpful. I provide the run time for each stage at the end of this post. It turns out that improvements were possible and will be converting collatz3_d.pl to use File::Map for the parallel demonstration.

Update: Further improvements, Step 4.

1. Replaced division by 2.

$n >> 1;
[download]

2. Removed one level of branching.

    while ($n != 1) {
        $result += $cache[$n], last
            if defined $cache[$n];

        my $new_n = $n % 2 ? 3 * $n + 1 : $n >> 1;
        $result++;
        $cache[$n] = $cache[$new_n] + 1
            if defined $cache[$new_n] and $n < $max;

        $n = $new_n;
    }
[download]

3. Then reduced the number of loop iterations. Credit for reducing the # of loop iterations was from watching Notation and compressed dynamics, one minute into it (i.e. the T(x) notation).

    while ($n != 1) {
        $result += $cache[$n], last
            if defined $cache[$n];

        $n % 2 ? ( $result += 2, $new_n = (3 * $n + 1) >> 1 )
               : ( $result += 1, $new_n = $n >> 1 );

        $cache[$n] = $cache[$new_n] + ($n % 2 ? 2 : 1)
            if defined $cache[$new_n] and $n < $max;

        $n = $new_n;
    }
[download]

4. Finally, less caching.

    while ($n != 1) {
        $result += $cache[$n], last
            if defined $cache[$n];

        $n % 2 ? ( $result += 2, $new_n = (3 * $n + 1) >> 1 )
               : ( $result += 1, $new_n = $n >> 1 );

        $n = $new_n;
    }
[download]

The final code optionally takes an argument.

#!/usr/bin/perl
use strict;
use warnings;
use feature qw/say/;

my $max = shift || 1e6;
$max = 1e6 if $max < 1e6;

my @cache = (0, 1, 2);

sub collatz_seq {
    my $input = shift;
    my $n = $input;
    my $result = 0;
    my $new_n;

    while ($n != 1) {
        $result += $cache[$n], last
            if defined $cache[$n];

        $n % 2 ? ( $result += 2, $new_n = (3 * $n + 1) >> 1 )
               : ( $result += 1, $new_n = $n >> 1 );

        $n = $new_n;
    }

    $cache[$input] = $result if $input < $max;
    return $result;
}

my @long_seqs;
for my $num (1..$max) {
    my $seq_length = collatz_seq $num;
    push @long_seqs, [ $num, $seq_length ] if $seq_length > 400;
}

@long_seqs = sort { $b->[1] <=> $a->[1]} @long_seqs;
say "$_->[0]: $_->[1]" for @long_seqs[0..19];
[download]

Results:

$ time perl collatz3_a.pl 1e7

# Intel i7 laptop, Docker Container, Ubuntu + Perlbrew Perl 5.30.1

collatz3_a.pl 1e7   32.291s  (a) original code, accepts argument
collatz3_b.pl 1e7   30.134s  (b) a + replaced division with >> 1
collatz3_c.pl 1e7   28.503s  (c) b + removed 1 level of branching
collatz3_d.pl 1e7   21.464s  (d) c + reduced loop iterations
collatz3_e.pl 1e7   19.357s  (e) d + less caching

# AMD 3970x, Docker Container, Ubuntu + Perlbrew Perl 5.30.1

collatz3_a.pl 1e7   13.130s  (a) original code, accepts argument
collatz3_b.pl 1e7   12.394s  (b) a + replaced division with >> 1
collatz3_c.pl 1e7   12.261s  (c) b + removed 1 level of branching
collatz3_d.pl 1e7    9.170s  (d) c + reduced loop iterations
collatz3_e.pl 1e7    7.681s  (e) d + less caching

8400511: 686
8865705: 668
6649279: 665
9973919: 663
6674175: 621
7332399: 616
7532665: 616
5649499: 613
8474249: 611
6355687: 608
8847225: 606
9533531: 606
6635419: 603
9953129: 601
7464846: 598
7464847: 598
3732423: 597
5598635: 595
8397953: 593
6298465: 590
[download]

Regards, Mario

Comment on Re^2: Optimizing with Caching vs. Parallelizing (MCE::Map) Select or Download Code

Replies are listed 'Best First'.
Re^3: Optimizing with Caching vs. Parallelizing (MCE::Map) by Laurent_R (Canon) on Apr 16, 2020 at 12:57 UTC
Hi Mario, thanks a lot for your comments and suggestions. I did not expect such micro-optimizations to bring such a significant performance improvement, that's interesting. Especially replacing the division by 2 by a bit shift is the type of thing that I had stopped doing decades ago, when I figured that the C compiler I was using at the time was doing this type of optimization at least as well and often better than I was able to do. Good to be reminded that the optimizations you suggested can also be quite useful. Thank you. Following this very interesting thread and your new thread on MCE::Flow + Caching via File::Map, I made a new blog post on the Collatz sequence, trying to summarize some of the findings: http://blogs.perl.org/users/laurent_r/2020/04/revisiting-the-collatz-sequence-pwc-54.html. I did not try to explain your demonstration using `File::Map` for caching with parallel processing (your other thread) because I'm not sure to fully understand everything and was afraid of mis-representing your work. People can follow the link and read your own words. Thank you very much for your challenging ideas.	[reply] [d/l]
Re^4: Optimizing with Caching vs. Parallelizing (MCE::Map) by marioroy (Prior) on Apr 17, 2020 at 00:00 UTC
Hi Laurent, Thank you, for your caching algorithm. It has kept me busy after hours. I added Step 4 for less caching. Basically, commenting 2 lines. I updated my posts here and here. `sub collatz_seq { my $input = shift; my $n = $input; my $result = 0; my $new_n; while ($n != 1) { $result += $cache[$n], last if defined $cache[$n]; $n % 2 ? ( $result += 2, $new_n = (3 * $n + 1) >> 1 ) : ( $result += 1, $new_n = $n >> 1 ); # $cache[$n] = $cache[$new_n] + ($n % 2 ? 2 : 1) # if defined $cache[$new_n] and $n < $max; $n = $new_n; } $cache[$input] = $result if $input < $max; return $result; }` [download] A new member collatz3_e was added to the list. # Intel i7 laptop, Docker Container, Ubuntu + Perlbrew Perl 5.30.1 collatz3_a.pl 1e7 32.291s (a) original, accepts argument collatz3_b.pl 1e7 30.134s (b) a + replaced division with >> 1 collatz3_c.pl 1e7 28.503s (c) b + removed 1 level of branching collatz3_d.pl 1e7 21.464s (d) c + reduced loop iterations collatz3_e.pl 1e7 19.357s (e) d + caching less # AMD 3970x, Docker Container, Ubuntu + Perlbrew Perl 5.30.1 collatz3_a.pl 1e7 13.130s (a) original, accepts argument collatz3_b.pl 1e7 12.394s (b) a + replaced division with >> 1 collatz3_c.pl 1e7 12.261s (c) b + removed 1 level of branching collatz3_d.pl 1e7 9.170s (d) c + reduced loop iterations collatz3_e.pl 1e7 7.681s (e) d + caching less [download] The 32-core machine reaches below 0.5 seconds for size 1e7. That includes the time to launch Perl, load modules, spin up and reap workers. The Collatz Conjenture took over me. And finally am able to move on. Kind regards, Mario	[reply] [d/l] [select]


XP is just a number
	PerlMonks