|laziness, impatience, and hubris|
Re: ssh output is partial when using fork managerby QM (Parson)
|on Jan 24, 2018 at 11:00 UTC||Need Help??|
Further to salva's reply, try experimenting separately with number of nodes, and length of timeout. You may discover there is a relationship. If there is any variation, try plotting number of nodes and length of timeout to get a successful run.
Try looking for problem nodes, by splitting the list into halves, or removing 5 or 10 different nodes each time. You may discover that there are one or two specific nodes that get hung up, but only with a large number of nodes (so it could be network traffic congestion, and poor recovery to/from certain nodes).
What happens to process memory when node count goes up? (Perhaps there's a memory leak/retention you aren't expecting.)
What happens if you run this from different host nodes? Especially, hosts not on the same end router as the original host?
Do you have a different large pool of target nodes, other than the original? How does it perform compared to the original?
Is there anything else you can vary?