http://qs321.pair.com?node_id=11115103


in reply to Optimizing with Caching vs. Parallelizing (MCE::Map)

Hi Nick.

I wonder if you could use an algorithm that could actually benefit both from parallel processing and from caching.

If you take your data set and interleaf it /8 or /12 (for your 12 cores), and then do the caching within each interleaf, you will have to regenerate the cache for each worker, but each worker maintains its own cache so that there's no per-iteration write back.

           iteration 1    iteration 2    iteration 3    iteration 4
worker 1       1               9              17              25
worker 2       2              10              18              26
worker 3       3              11              19              27
worker 4       4              12              20              28
worker 5       5              13              21              29
worker 6       6              14              22              30
worker 7       7              15              23              31
worker 8       8              16              24              32

So worker 1 gets 1, 9, 17, 25, ... to work on. Worker 2 gets 2, 10, 18, 26. Worker 3 gets 3, 11, 19, 27. And so on.

Within each worker implement caching. If I were using Parallel::ForkManager I would have them each return a structure identifying the worker starting offset, and the sequences. That way you don't have to compose them back into a master structure at the end for output, you can sequence them round-robin based on the starting offset of each.

I've sat down a few times today with the intent to give it a try, but keep getting interrupted before I get very far into it. So it's probably not going to happen.

Interleafing ought to allow each worker to build a cache pretty quickly.


Dave

  • Comment on Re: Optimizing with Caching vs. Parallelizing (MCE::Map)

Replies are listed 'Best First'.
Re^2: Optimizing with Caching vs. Parallelizing (MCE::Map)
by 1nickt (Canon) on Apr 06, 2020 at 01:00 UTC

    Hi davido,

    I did indeed try that approach, and while I did not interleave the sequence, as I mentioned MCE chunks it up. (I don't think interleaving would really much affect the cache hits, as the caching tested stores the sequence for each number found, and the algorithm produces sequences that themselves include numbers greater than n, so such a boundary would be difficult to enforce.)

    In testing, with each worker building and using its own cache for the chuck processed, the program ran almost three times slower than just letting the workers hammer the CPU. Again, I believe it's because in this case the overhead of caching outweighs its benefits.


    The way forward always starts with a minimal test.