Re^3: Optimizing with Caching vs. Parallelizing (MCE::Map) (Better caching?)

in reply to Re^2: Optimizing with Caching vs. Parallelizing (MCE::Map) (Better caching?)
in thread Optimizing with Caching vs. Parallelizing (MCE::Map)

Hi marioroy,

I think I'm having an issue with MCE on Windows here, timing your code (collatz_vr):

use Time::HiRes 'time';
my $t = time;

# the whole script here

say time - $t;
MCE::Flow->finish;
say time - $t;

__END__

1 worker:
6.01482510566711
7.76495599746704

2 workers:
4.12953305244446
7.07751798629761

4 workers:
3.33010196685791
8.4802930355072
[download]

1st measurement approximately matches your output, but 1.5 - 2 seconds per worker to shutdown doesn't look OK to me.

For vr's demo, every worker starts with an empty cache. Meaning that workers do not have cached results from prior chunks. This is the reason not scaling as well versus the non-cache demonstrations

In other words, lots and lots of work is needlessly duplicated.

_cache_collatz( $_ ) for 1 .. 1e6;
say scalar %cache;

%cache = ();
_cache_collatz( $_ ) for 1 + 4e5 .. 1e6;
say scalar %cache;

__END__

2168611
2168611
[download]

So, effectively, in situation with e.g. 10 workers, 4 junior workers, filling cache for ranges up to 400_000, are free to slack off and not gather their results at all, and there's large amount of overlap in work of 6 senior workers, too. For now, I have no solid idea how to parallelize this algorithm efficiently.

In Section Meditations