http://qs321.pair.com?node_id=11115148


in reply to Re^4: Optimizing with Caching vs. Parallelizing (MCE::Map) (Updated)
in thread Optimizing with Caching vs. Parallelizing (MCE::Map)

Hi choroba and 1nickt,

I wonder about choroba's example involving several array fetches inside the loop. The following attempts to find out.

Diff output

$ diff choroba.pl 1nickt.pl 8,9c8,11 < push @seq, ( $seq[-1] / 2, 3 * $seq[-1] + 1 )[ $seq[-1] % 2 ] < while $seq[-1] != 1; --- > while ( $n != 1 ) { > $n = $n % 2 ? 3 * $n + 1 : $n / 2; > push @seq, $n; > }

Demo choroba.pl

use warnings; use strict; use feature 'say'; use MCE::Flow; sub collatz { my ($n) = @_; my @seq = $n; push @seq, ( $seq[-1] / 2, 3 * $seq[-1] + 1 )[ $seq[-1] % 2 ] while $seq[-1] != 1; return @seq; } my @sizes; mce_flow_s { max_workers => MCE::Util::get_ncpu(), bounds_only => 1, gather => \@sizes, }, sub { my ($start, $end) = @{ $_[1] }; my @chunk_sizes; push @chunk_sizes, [ $_, scalar collatz($_) ] for $start .. $end; MCE->gather( @chunk_sizes ); }, 1, 1e6; MCE::Flow->finish; say "@$_" for reverse +(sort { $b->[1] <=> $a->[1] } @sizes)[0..19];

Demo 1nickt.pl

use warnings; use strict; use feature 'say'; use MCE::Flow; sub collatz { my ($n) = @_; my @seq = $n; while ( $n != 1 ) { $n = $n % 2 ? 3 * $n + 1 : $n / 2; push @seq, $n; } return @seq; } my @sizes; mce_flow_s { max_workers => MCE::Util::get_ncpu(), bounds_only => 1, gather => \@sizes, }, sub { my ($start, $end) = @{ $_[1] }; my @chunk_sizes; push @chunk_sizes, [ $_, scalar collatz($_) ] for $start .. $end; MCE->gather( @chunk_sizes ); }, 1, 1e6; MCE::Flow->finish; say "@$_" for reverse +(sort { $b->[1] <=> $a->[1] } @sizes)[0..19];

Run time on an AMD 3rd Gen Ryzen Threadripper 3970x box - SMT disabled

max_workers => 1 choroba 21.432 seconds 1nickt 18.644 seconds max_workers => 2 choroba 10.808 seconds 1nickt 9.348 seconds max_workers => 4 choroba 5.836 seconds 1nickt 4.992 seconds max_workers => 8 choroba 3.163 seconds 1nickt 2.731 seconds max_workers => 16 choroba 1.835 seconds 1nickt 1.623 seconds max_workers => 32 choroba 1.218 seconds 1nickt 1.105 seconds

Interestingly, the more cores the lesser the difference on this hardware.

Regards, Mario