comment on

Dearest Monks of the Monastery,

Recently, I tried computing the longest Collatz progression here. I was pleasantly surprised by File::Map's performance. Our fellow monk Laurent_R posted an update to his original code for computing the Collatz sequences. And what a speedup it is.

Here, I want to try Laurent's code and run parallel. Yes, with caching. The first thing I do typically is apply optimizations to the serial implementation. Because you know, just think of any domino impact running parallel might have. See my update to Laurent's code. That went well and so will take collatz3_e.pl there and use File::Map here. This is exciting for me because this is a great use case for File::Map for running the algorithm in parallel. But with all things, a serial version using File::Map is needed for comparison.

Update 1: Map using map_anonymous, previously map_file.
Update 2: Use 16-bit signed integer for pack/unpack.

Note: The OS must have ~ 3.8 GiB of available memory to compute size 1e9.

collatz3_filemap:

#!/usr/bin/env perl
use strict;
use warnings;

use File::Map qw/map_anonymous unmap/;

my $size = shift || 1e6;

$size = 1e6 if $size < 1e6;  # minimum
$size = 1e9 if $size > 1e9;  # maximum

map_anonymous my $cache, $size * 2, 'shared';

# init cache with -1's, then set 0, 1, 2
substr($cache, 0, $size * 2, ( my $neg1 = pack('s', -1) ) x $size);
substr($cache, $_ * 2, 2, pack('s', $_)) for 0..2;

my @seqs;

sub collatz_seq {
    my ( $seq_beg, $seq_end ) = @_;
    my ( $n, $steps, $tmp );
    for my $input ( $seq_beg..$seq_end ) {
        $n = $input, $steps = 0;
        while ( $n != 1 ) {
            $steps += unpack('s', $tmp), last
                if ($n < $size && ($tmp = substr($cache, $n * 2, 2)) n
+e $neg1);

            $n % 2 ? ( $steps += 2, $n = (3 * $n + 1) >> 1 )
                   : ( $steps += 1, $n = $n >> 1 );
        }
        substr($cache, $input * 2, 2, pack('s', $steps)) if $input < $
+size;
        push @seqs, [ $input, $steps ] if $steps > 400;
    }
}

collatz_seq(2, $size); unmap $cache;
@seqs = ( sort { $b->[1] <=> $a->[1]} @seqs )[ 0..19 ];

printf "Collatz(%5d) has sequence length of %3d steps\n", @$_
    for @seqs;
[download]

collatz3_parallel:

This is the serial implementation converted to run parallel. The collatz_seq function is identical, no changes there.

#!/usr/bin/env perl
use strict;
use warnings;

use File::Map qw/map_anonymous unmap/;
use MCE::Flow;
use MCE::Candy;

my $size = shift || 1e6;

$size = 1e6 if $size < 1e6;  # minimum
$size = 1e9 if $size > 1e9;  # maximum

map_anonymous my $cache, $size * 2, 'shared';

# init cache with -1's, then set 0, 1, 2
substr($cache, 0, $size * 2, ( my $neg1 = pack('s', -1) ) x $size);
substr($cache, $_ * 2, 2, pack('s', $_)) for 0..2;

# local to workers and the manager process
my @seqs;

sub collatz_seq {
    my ( $seq_beg, $seq_end ) = @_;
    my ( $n, $steps, $tmp );
    for my $input ( $seq_beg..$seq_end ) {
        $n = $input, $steps = 0;
        while ( $n != 1 ) {
            $steps += unpack('s', $tmp), last
                if ($n < $size && ($tmp = substr($cache, $n * 2, 2)) n
+e $neg1);

            $n % 2 ? ( $steps += 2, $n = (3 * $n + 1) >> 1 )
                   : ( $steps += 1, $n = $n >> 1 );
        }
        substr($cache, $input * 2, 2, pack('s', $steps)) if $input < $
+size;
        push @seqs, [ $input, $steps ] if $steps > 400;
    }
}

my $chunk_size;

$chunk_size  = int( $size / MCE::Util::get_ncpu() / 80 + 1 );
$chunk_size += 1 if $chunk_size % 2;

mce_flow_s {
    max_workers => MCE::Util::get_ncpu(),
    chunk_size  => $chunk_size,
    bounds_only => 1,
    gather      => MCE::Candy::out_iter_array(\@seqs),
}, sub {
    my ($mce, $chunk_ref, $chunk_id) = @_;
    collatz_seq(@{ $chunk_ref });
    @seqs > 20
        ? MCE->gather($chunk_id, ( sort { $b->[1] <=> $a->[1] } @seqs 
+)[ 0..19 ])
        : MCE->gather($chunk_id, @seqs);
    @seqs = ();
}, 2, $size;

MCE::Flow->finish; unmap $cache;
@seqs = ( sort { $b->[1] <=> $a->[1]} @seqs )[ 0..19 ];

printf "Collatz(%5d) has sequence length of %3d steps\n", @$_
    for @seqs;
[download]

Results:

Caching using File::Map obviously will have overhead plus having to serialize/unserialize using pack/unpack.

$ time perl collatz3_a.pl 1e7

AMD 3970x, Docker Container, Ubuntu + Perlbrew Perl 5.30.1

collatz3_a.pl 1e7  13.130s  (a) original, accepts argument
collatz3_b.pl 1e7  12.394s  (b) a + replaced division with >> 1
collatz3_c.pl 1e7  12.261s  (c) b + removed 1 level of branching
collatz3_d.pl 1e7   9.170s  (d) c + reduced loop iterations
collatz3_e.pl 1e7   7.681s  (e) d + less caching

collatz3_filemap    8.889s    1 core

collatz3_parallel   8.982s    1 core
collatz3_parallel   4.548s    2 cores
collatz3_parallel   2.378s    4 cores
collatz3_parallel   1.233s    8 cores
collatz3_parallel   0.661s   16 cores
collatz3_parallel   0.408s   32 cores

Collatz(8400511) has sequence length of 686 steps
Collatz(8865705) has sequence length of 668 steps
Collatz(6649279) has sequence length of 665 steps
Collatz(9973919) has sequence length of 663 steps
Collatz(6674175) has sequence length of 621 steps
Collatz(7332399) has sequence length of 616 steps
Collatz(7532665) has sequence length of 616 steps
Collatz(5649499) has sequence length of 613 steps
Collatz(8474249) has sequence length of 611 steps
Collatz(6355687) has sequence length of 608 steps
Collatz(8847225) has sequence length of 606 steps
Collatz(9533531) has sequence length of 606 steps
Collatz(6635419) has sequence length of 603 steps
Collatz(9953129) has sequence length of 601 steps
Collatz(7464846) has sequence length of 598 steps
Collatz(7464847) has sequence length of 598 steps
Collatz(3732423) has sequence length of 597 steps
Collatz(5598635) has sequence length of 595 steps
Collatz(8397953) has sequence length of 593 steps
Collatz(6298465) has sequence length of 590 steps
[download]

Some will say, let's add cores. Some will say, let's improve the algorithm. Few might say, let's try both. It turns out that caching and parallel are possible. It's unbelievable, TBH. Processors are equipped with many CPU cores. I made the time to try and retry. Mainly, for future Monks to the Monastery, way after I'm gone. Years ago the saying was, "IO and Parallel" isn't possible. Input IO in MCE is sequential, not random.

What I have witnessed tonight is that Meta::Cpan is a treasure box. In other words, a big gigantic box of Legos. I opened the box and picked out File::Map, MCE::Flow and then went over to this wonderful Monastery. There I looked for Laurent_R's code.

I tried this not knowing what to expect. This is my first time using File::Map with many workers.

Regards, Mario

In reply to MCE::Flow + Caching via File::Map by marioroy

Are you posting in the right place? Check out Where do I post X? to know for sure.
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
Want more info? How to link or How to display code and escape characters are good places to start.


Clear questions and runnable code get the best and fastest answer
	PerlMonks