I did indeed try that approach, and while I did not interleave the sequence, as I mentioned MCE chunks it up. (I don't think interleaving would really much affect the cache hits, as the caching tested stores the sequence for each number found, and the algorithm produces sequences that themselves include numbers greater than n, so such a boundary would be difficult to enforce.)
In testing, with each worker building and using its own cache for the chuck processed, the program ran almost three times slower than just letting the workers hammer the CPU. Again, I believe it's because in this case the overhead of caching outweighs its benefits.
The way forward always starts with a minimal test.