http://qs321.pair.com?node_id=11115238


in reply to Re^2: Optimizing with Caching vs. Parallelizing (MCE::Map) (Better caching?)
in thread Optimizing with Caching vs. Parallelizing (MCE::Map)

Hi marioroy,

I think I'm having an issue with MCE on Windows here, timing your code (collatz_vr):

use Time::HiRes 'time'; my $t = time; # the whole script here say time - $t; MCE::Flow->finish; say time - $t; __END__ 1 worker: 6.01482510566711 7.76495599746704 2 workers: 4.12953305244446 7.07751798629761 4 workers: 3.33010196685791 8.4802930355072

1st measurement approximately matches your output, but 1.5 - 2 seconds per worker to shutdown doesn't look OK to me.

For vr's demo, every worker starts with an empty cache. Meaning that workers do not have cached results from prior chunks. This is the reason not scaling as well versus the non-cache demonstrations

In other words, lots and lots of work is needlessly duplicated.

_cache_collatz( $_ ) for 1 .. 1e6; say scalar %cache; %cache = (); _cache_collatz( $_ ) for 1 + 4e5 .. 1e6; say scalar %cache; __END__ 2168611 2168611

So, effectively, in situation with e.g. 10 workers, 4 junior workers, filling cache for ranges up to 400_000, are free to slack off and not gather their results at all, and there's large amount of overlap in work of 6 senior workers, too. For now, I have no solid idea how to parallelize this algorithm efficiently.

Replies are listed 'Best First'.
Re^4: Optimizing with Caching vs. Parallelizing (MCE::Map) (Better caching?)
by marioroy (Prior) on Apr 09, 2020 at 08:30 UTC

    Hi, vr

    The 1.5 - 2 seconds is likely threads freeing memory. Change my %cache to our %cache. MCE defaults to spawning threads on Windows. Try passing the MCE option use_threads => 0 to ensure emulated child instead.

    our %cache = ( 1 => [ 1, 1, undef ]); # for faster global cleanup mce_flow_s { max_workers => MCE::Util::get_ncpu(), gather => \@sizes, bounds_only => 1, use_threads => 0, }, sub { … }, 1, 1e6;