I originally started with P::FM as well, because it did look simple. But this uses forks, not threads, and so couldn't be applied to my problem.
(Specifically, concatenating a string in a distributed way, where order wasn't important: using parallel processing to concatenate a string, where order of concatenation doesn't matter.)
It didn't work, in a way that violated my expectations, because parallelization, like I said, is painful. In this case I would have needed to use threads.
Perhaps there could be a ForkedMapReduce as well as a ThreadedMapReduce. The important thing to me is a reliable abstraction that Does What I Mean.