Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

Re^4: Optimizing with Caching vs. Parallelizing (MCE::Map) (Updated)

by 1nickt (Canon)
on Apr 06, 2020 at 18:49 UTC ( [id://11115141]=note: print w/replies, xml ) Need Help??


in reply to Re^3: Optimizing with Caching vs. Parallelizing (MCE::Map)
in thread Optimizing with Caching vs. Parallelizing (MCE::Map)

Hi choroba,

The three MCE::Map exported functions, mce_map, mce_map_f and mce_map_s all return orderly output (since they mimic, well, map).

(You can also get ordered output from a non-map function using an orderly gather() function such as the ones provided in MCE::Candy.)

You could implement a parallelized version of your code without orderly output just using MCE::Flow:

use warnings; use strict; use feature 'say'; use MCE::Util; use MCE::Flow; sub collatz { my ($start) = @_; my @seq = $start; push @seq, ($seq[-1] / 2, 3 * $seq[-1] + 1)[$seq[-1] % 2] while $seq[-1] != 1; return @seq; } my @sizes; mce_flow_s { max_workers => MCE::Util::get_ncpu(), bounds_only => 1, gather => \@sizes, }, sub { my ($mce, $chunk, $chunk_id ) = @_; my ($start, $end) = @$chunk; my @chunk_sizes; push @chunk_sizes, [$_, scalar collatz($_)] for $start .. $end; MCE->gather( @chunk_sizes ); }, 1, 1e6; say "@$_" for reverse +(sort { $b->[1] <=> $a->[1] } @sizes)[0..19];

This runs a tiny bit faster on my system, as you might expect. But it's more code than using MCE::Map, as you can see.


Update: But with a non-ordered input sequence, such as keys %hash, or where processing time for each element is likely to vary, this may be a more significant factor. Also note that you can call gather() multiple times from within the user sub being processed by MCE, without returning, so you can view or handle the output as it is produced (by specifying a callback for your gatherer). And then there's MCE::Stream ... ;-)

$ time perl ch-map.pl 922525 445 922524 445 906175 445 886953 445 615017 447 410011 449 820023 450 820022 450 818943 450 546681 452 970599 458 796095 468 767903 468 511935 470 927003 476 910107 476 704623 504 939497 507 626331 509 837799 525 real 0m4.843s user 0m47.423s sys 0m0.265s

time perl ch-flow.pl 922525 445 922524 445 906175 445 886953 445 615017 447 410011 449 820023 450 820022 450 818943 450 546681 452 970599 458 796095 468 767903 468 511935 470 927003 476 910107 476 704623 504 939497 507 626331 509 837799 525 real 0m4.661s user 0m47.057s sys 0m0.255s

Hope this is of interest!


The way forward always starts with a minimal test.

Replies are listed 'Best First'.
Re^5: Optimizing with Caching vs. Parallelizing (MCE::Map) (Updated)
by marioroy (Prior) on Apr 07, 2020 at 00:48 UTC

    Hi choroba and 1nickt,

    I wonder about choroba's example involving several array fetches inside the loop. The following attempts to find out.

    Diff output

    $ diff choroba.pl 1nickt.pl 8,9c8,11 < push @seq, ( $seq[-1] / 2, 3 * $seq[-1] + 1 )[ $seq[-1] % 2 ] < while $seq[-1] != 1; --- > while ( $n != 1 ) { > $n = $n % 2 ? 3 * $n + 1 : $n / 2; > push @seq, $n; > }

    Demo choroba.pl

    use warnings; use strict; use feature 'say'; use MCE::Flow; sub collatz { my ($n) = @_; my @seq = $n; push @seq, ( $seq[-1] / 2, 3 * $seq[-1] + 1 )[ $seq[-1] % 2 ] while $seq[-1] != 1; return @seq; } my @sizes; mce_flow_s { max_workers => MCE::Util::get_ncpu(), bounds_only => 1, gather => \@sizes, }, sub { my ($start, $end) = @{ $_[1] }; my @chunk_sizes; push @chunk_sizes, [ $_, scalar collatz($_) ] for $start .. $end; MCE->gather( @chunk_sizes ); }, 1, 1e6; MCE::Flow->finish; say "@$_" for reverse +(sort { $b->[1] <=> $a->[1] } @sizes)[0..19];

    Demo 1nickt.pl

    use warnings; use strict; use feature 'say'; use MCE::Flow; sub collatz { my ($n) = @_; my @seq = $n; while ( $n != 1 ) { $n = $n % 2 ? 3 * $n + 1 : $n / 2; push @seq, $n; } return @seq; } my @sizes; mce_flow_s { max_workers => MCE::Util::get_ncpu(), bounds_only => 1, gather => \@sizes, }, sub { my ($start, $end) = @{ $_[1] }; my @chunk_sizes; push @chunk_sizes, [ $_, scalar collatz($_) ] for $start .. $end; MCE->gather( @chunk_sizes ); }, 1, 1e6; MCE::Flow->finish; say "@$_" for reverse +(sort { $b->[1] <=> $a->[1] } @sizes)[0..19];

    Run time on an AMD 3rd Gen Ryzen Threadripper 3970x box - SMT disabled

    max_workers => 1 choroba 21.432 seconds 1nickt 18.644 seconds max_workers => 2 choroba 10.808 seconds 1nickt 9.348 seconds max_workers => 4 choroba 5.836 seconds 1nickt 4.992 seconds max_workers => 8 choroba 3.163 seconds 1nickt 2.731 seconds max_workers => 16 choroba 1.835 seconds 1nickt 1.623 seconds max_workers => 32 choroba 1.218 seconds 1nickt 1.105 seconds

    Interestingly, the more cores the lesser the difference on this hardware.

    Regards, Mario

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11115141]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others imbibing at the Monastery: (4)
As of 2024-04-19 02:03 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found