Yes, I'm suggesting that you need to experiment to find the optimum number of simultaneous instances. For running many jobs a fixed number of times, back then I used
Run commands in parallel to run the processes. These days I'd probably use
Parallel::ForkManager.
There is no general rule to tell what is bottlenecking a process without knowing what it does in detail, or measuring it. A cheap way to get an idea, though, is to run a copy on a lightly loaded machine, and watch top. If your process is taking close to 100% CPU, then it is almost definitely CPU bound. If it is taking significantly less than 100% CPU, then something else is a problem at least some of the time. As a bonus, you also now know about how many copies of the process you can run until you run out of CPU. But you don't know what else is taking up time.