in reply to Re: Optimizing with Caching vs. Parallelizing (MCE::Map) (Better caching?)
in thread Optimizing with Caching vs. Parallelizing (MCE::Map)
Hi vr,
Wow! Curiosity moved me to try with MCE. But first, I optimized Nick's code. Also, workers now send only the relevant data and not the entire list. This decreases the time needed for the manager process to output results.
Update 1: Slight optimization to Choroba's example (i.e. >> 1). Added collatz_compress.
Update 2: Added collatz_count.
Update 3: Improved collatz_count.
Update 4: Added Inline C times.
Demo choroba, 1nickt, compress and count all in one.
use strict; use warnings; use feature 'say'; use MCE::Flow; sub collatz_choroba { my ($n) = @_; my @seq = $n; push @seq, ( $seq[-1] >> 1, 3 * $seq[-1] + 1 )[ $seq[-1] % 2 ] while $seq[-1] != 1; return scalar @seq; } sub collatz_1nickt { my ($n) = @_; my @seq = $n; push @seq, $n = $n % 2 ? 3 * $n + 1 : $n >> 1 while $n != 1; return scalar @seq; } sub collatz_compress { my ($n) = @_; my @seq = $n; # Notation and compressed dynamics (2:15 into the video) # https://www.youtube.com/watch?v=t1I9uHF9X5Y # consider the map # C(x) = $n % 2 ? 3 * $n + 1 : $n / 2 # compress the map to the dynamically equivalent # T(x) = $n % 2 ? (3 * $n + 1) / 2 : $n / 2 # compress loop iteration when odd sequence push @seq, $n % 2 ? ( $n = 3 * $n + 1, $n = $n >> 1 ) : ( $n = $n >> 1 ) while $n != 1; return scalar @seq; } sub collatz_count { my ($n) = @_; my $steps = 1; # count using the T(x) notation $n % 2 ? ( $steps += 2, $n = (3 * $n + 1) >> 1 ) : ( $steps += 1, $n = $n >> 1 ) while $n != 1; return $steps; } #*collatz = \&collatz_choroba; # choose collatz here #*collatz = \&collatz_1nickt; #*collatz = \&collatz_compress; *collatz = \&collatz_count; my @sizes; MCE::Flow->init( max_workers => MCE::Util::get_ncpu(), gather => \@sizes, bounds_only => 1, ); mce_flow_s sub { my ($start, $end) = @{ $_[1] }; my @ret = map { [ $_, collatz($_) ] } $start .. $end; MCE->gather( ( sort { $b->[1] <=> $a->[1] } @ret )[ 0..19 ] ); }, 1, 1e6; MCE::Flow->finish; say "@$_" for ( sort { $b->[1] <=> $a->[1] } @sizes )[ 0..19 ];
Demo caching by vr
use strict; use warnings; use feature 'say'; use MCE::Flow; my %cache = ( 1 => [ 1, 1, undef ]); # # [ seq_length, start_num (same as key), next_num ] # sub collatz_vr { my $n = shift; return if exists $cache{ $n }; my ( @tmp, $new_n ); while ( !exists $cache{ $n } ) { $new_n = $n % 2 ? 3 * $n + 1 : $n >> 1; push @tmp, $cache{ $n } = [ -$#tmp, $n, $new_n ]; $n = $new_n } my $d = $#tmp + $cache{ $n }[ 0 ]; $_->[ 0 ] += $d for @tmp; } my @sizes; mce_flow_s { max_workers => MCE::Util::get_ncpu(), gather => \@sizes, bounds_only => 1, }, sub { my ($start, $end) = @{ $_[1] }; collatz_vr($_) for $start .. $end; MCE->gather( ( sort { $b->[0] <=> $a->[0] } @cache{ $start .. $end } )[ 0 . +. 19 ] ); }, 1, 1e6; MCE::Flow->finish; say "@$_[ 1, 0 ]" for ( sort { $b->[0] <=> $a->[0] } @sizes )[ 0 .. 19 ];
Run time on a 32 core machine - SMT disabled
The times include launching Perl, loading modules, spawning workers, reaping workers, and output (~ 0.100 seconds). See the next post for the Inline C demonstration.
max_workers => 1 collatz_vr 3.617s 1.0x collatz_choroba 14.896s 1.0x collatz_1nickt 12.264s 1.0x collatz_compress 8.932s 1.0x collatz_count 7.021s 1.0x collatz_count_c1 0.741s 1.0x Inline C collatz_count_c2 0.625s 1.0x Inline C max_workers => 2 collatz_vr 2.707s 1.3x collatz_choroba 7.538s 2.0x collatz_1nickt 6.141s 2.0x collatz_compress 4.490s 2.0x collatz_count 3.537s 2.0x collatz_count_c1 0.398s 1.9x Inline C collatz_count_c2 0.339s 1.8x Inline C max_workers => 4 collatz_vr 1.906s 1.9x collatz_choroba 3.948s 3.8x collatz_1nickt 3.274s 3.7x collatz_compress 2.365s 3.8x collatz_count 1.873s 3.7x collatz_count_c1 0.235s 3.2x Inline C collatz_count_c2 0.203s 3.1x Inline C max_workers => 8 collatz_vr 1.458s 2.5x collatz_choroba 1.995s 7.5x collatz_1nickt 1.655s 7.4x collatz_compress 1.209s 7.4x collatz_count 0.961s 7.3x collatz_count_c1 0.149s 5.0x Inline C collatz_count_c2 0.132s 4.7x Inline C max_workers => 16 collatz_vr 0.976s 3.7x collatz_choroba 1.034s 14.4x collatz_1nickt 0.858s 14.3x collatz_compress 0.637s 14.0x collatz_count 0.509s 13.8x collatz_count_c1 0.114s 6.5x Inline C collatz_count_c2 0.105s 6.0x Inline C max_workers => 32 collatz_vr 0.837s 4.3x collatz_choroba 0.593s 25.1x collatz_1nickt 0.497s 24.7x collatz_compress 0.379s 23.6x collatz_count 0.310s 22.6x collatz_count_c1 0.112s 6.6x Inline C collatz_count_c2 0.108s 5.8x Inline C
Interesting. For vr's demo, every worker starts with an empty cache. Meaning that workers do not have cached results from prior chunks. This is the reason not scaling versus the non-cache demonstrations.
collatz_count (2 cores) reaches collatz_vr (1 core).
collatz_count (4 cores) reaches collatz_vr (4 cores).
Kind regards, Mario
|
---|
Replies are listed 'Best First'. | |
---|---|
Re^3: Optimizing with Caching vs. Parallelizing (MCE::Map) (Inline C)
by marioroy (Prior) on Apr 11, 2020 at 04:37 UTC | |
Re^3: Optimizing with Caching vs. Parallelizing (MCE::Map) (Better caching?)
by vr (Curate) on Apr 08, 2020 at 21:40 UTC | |
by marioroy (Prior) on Apr 09, 2020 at 08:30 UTC |