Re^4: Optimizing with Caching vs. Parallelizing (MCE::Map) (Updated)

Hi choroba,

The three MCE::Map exported functions, mce_map, mce_map_f and mce_map_s all return orderly output (since they mimic, well, map).

(You can also get ordered output from a non-map function using an orderly gather() function such as the ones provided in MCE::Candy.)

You could implement a parallelized version of your code without orderly output just using MCE::Flow:

use warnings; use strict; use feature 'say';
use MCE::Util;
use MCE::Flow;

sub collatz {
    my ($start) = @_;
    my @seq = $start;
    push @seq, ($seq[-1] / 2, 3 * $seq[-1] + 1)[$seq[-1] % 2]
        while $seq[-1] != 1;
    return @seq;
}

my @sizes;

mce_flow_s {
    max_workers => MCE::Util::get_ncpu(),
    bounds_only => 1,
    gather      => \@sizes,
}, sub {
   my ($mce, $chunk, $chunk_id ) = @_;
   my ($start, $end) = @$chunk;
   my @chunk_sizes;

   push @chunk_sizes, [$_, scalar collatz($_)] for $start .. $end;

   MCE->gather( @chunk_sizes );
}, 1, 1e6;

say "@$_" for reverse +(sort { $b->[1] <=> $a->[1] } @sizes)[0..19];
[download]

This runs a tiny bit faster on my system, as you might expect. But it's more code than using MCE::Map, as you can see.

Update: But with a non-ordered input sequence, such as keys %hash, or where processing time for each element is likely to vary, this may be a more significant factor. Also note that you can call gather() multiple times from within the user sub being processed by MCE, without returning, so you can view or handle the output as it is produced (by specifying a callback for your gatherer). And then there's MCE::Stream ... ;-)

$ time perl ch-map.pl
922525 445
922524 445
906175 445
886953 445
615017 447
410011 449
820023 450
820022 450
818943 450
546681 452
970599 458
796095 468
767903 468
511935 470
927003 476
910107 476
704623 504
939497 507
626331 509
837799 525

real    0m4.843s
user    0m47.423s
sys    0m0.265s
[download]

time perl ch-flow.pl

922525 445
922524 445
906175 445
886953 445
615017 447
410011 449
820023 450
820022 450
818943 450
546681 452
970599 458
796095 468
767903 468
511935 470
927003 476
910107 476
704623 504
939497 507
626331 509
837799 525

real    0m4.661s
user    0m47.057s
sys    0m0.255s
[download]

Hope this is of interest!

The way forward always starts with a minimal test.

Comment on Re^4: Optimizing with Caching vs. Parallelizing (MCE::Map) (Updated) Select or Download Code

Replies are listed 'Best First'.

Re^5: Optimizing with Caching vs. Parallelizing (MCE::Map) (Updated)
by marioroy (Prior) on Apr 07, 2020 at 00:48 UTC

Hi choroba and 1nickt,

I wonder about choroba's example involving several array fetches inside the loop. The following attempts to find out.

Diff output

$ diff choroba.pl 1nickt.pl
8,9c8,11
<     push @seq, ( $seq[-1] / 2, 3 * $seq[-1] + 1 )[ $seq[-1] % 2 ]
<         while $seq[-1] != 1;
---
>     while ( $n != 1 ) {
>         $n = $n % 2 ? 3 * $n + 1 : $n / 2;
>         push @seq, $n;
>     }
[download]

Demo choroba.pl

use warnings; use strict;
use feature 'say';
use MCE::Flow;

sub collatz {
    my ($n) = @_;
    my @seq = $n;
    push @seq, ( $seq[-1] / 2, 3 * $seq[-1] + 1 )[ $seq[-1] % 2 ]
        while $seq[-1] != 1;
    return @seq;
}

my @sizes;

mce_flow_s {
    max_workers => MCE::Util::get_ncpu(),
    bounds_only => 1,
    gather      => \@sizes,
}, sub {
    my ($start, $end) = @{ $_[1] };
    my @chunk_sizes;
    push @chunk_sizes, [ $_, scalar collatz($_) ] for $start .. $end;
    MCE->gather( @chunk_sizes );
}, 1, 1e6;

MCE::Flow->finish;

say "@$_" for reverse +(sort { $b->[1] <=> $a->[1] } @sizes)[0..19];
[download]

Demo 1nickt.pl

use warnings; use strict;
use feature 'say';
use MCE::Flow;

sub collatz {
    my ($n) = @_;
    my @seq = $n;
    while ( $n != 1 ) {
        $n = $n % 2 ? 3 * $n + 1 : $n / 2;
        push @seq, $n;
    }
    return @seq;
}

my @sizes;

mce_flow_s {
    max_workers => MCE::Util::get_ncpu(),
    bounds_only => 1,
    gather      => \@sizes,
}, sub {
    my ($start, $end) = @{ $_[1] };
    my @chunk_sizes;
    push @chunk_sizes, [ $_, scalar collatz($_) ] for $start .. $end;
    MCE->gather( @chunk_sizes );
}, 1, 1e6;

MCE::Flow->finish;

say "@$_" for reverse +(sort { $b->[1] <=> $a->[1] } @sizes)[0..19];
[download]

Run time on an AMD 3rd Gen Ryzen Threadripper 3970x box - SMT disabled

max_workers =>  1
  choroba  21.432 seconds
  1nickt   18.644 seconds

max_workers =>  2
  choroba  10.808 seconds
  1nickt    9.348 seconds

max_workers =>  4
  choroba   5.836 seconds
  1nickt    4.992 seconds

max_workers =>  8
  choroba   3.163 seconds
  1nickt    2.731 seconds

max_workers => 16
  choroba   1.835 seconds
  1nickt    1.623 seconds

max_workers => 32
  choroba   1.218 seconds
  1nickt    1.105 seconds
[download]

Interestingly, the more cores the lesser the difference on this hardware.

Regards, Mario

[reply]
[d/l]
[select]


Clear questions and runnable code get the best and fastest answer
	PerlMonks