Re: help me fork

Replies are listed 'Best First'.
Re^2: help me fork by tilly (Archbishop) on Jul 20, 2004 at 15:29 UTC
This is bad advice. More precisely, it is advice that only applies to CPU-bound jobs. If your job spends a significant fraction of its time waiting for network or disk, then you're wrong. Increasing the priority of something that isn't waiting for CPU does you no good at all since they aren't having trouble there. Adding processes is good because it is not that much extra work to have 5 processes waiting for something external rather than 1. And while a disk is spinning around to where one process has its data, nothing stops it from reading or writing somewhere else for another process. Note that syslogd is an I/O bound process, so unless there is a global lock preventing two copies doing work at once, it will benefit from running multiple times. Of course too many waiting jobs runs into trouble as the disk is trying to do too many things. What the optimal threshold is for any particular job is highly dependent on your exact hardware and configuration. Test and benchmark it. The last time that I did this for an I/O bound job, I found that on the machine I tested for the job that I was doing, I got maximum throughput at 5 jobs. I therefore batched my report to run 5 at a time. For a database-bound job I found that I got the best results at 7 copies. Had I taken your advice in either case I would have only used 2 processes - and would have got less than half the throughput that I did.	[reply]
Re^3: help me fork by mhearse (Chaplain) on Jul 20, 2004 at 15:50 UTC
Thanks for the reply. I'm definitely learning something here. Just to clarify, you are suggesting to experiment to find the optimum number of simultaneous instances of my benchmark program. You mentioned running them in batch. Would this best be done using the afore mentioned Parallel::ForkManager module? I've been reading up on fork. I don't believe that the plain fork function has the ability to control the number of children, does it? Is there a general rule to tell whether a process is CPU or I/O bound?	[reply]
Re^4: help me fork by tilly (Archbishop) on Jul 20, 2004 at 15:57 UTC
Yes, I'm suggesting that you need to experiment to find the optimum number of simultaneous instances. For running many jobs a fixed number of times, back then I used Run commands in parallel to run the processes. These days I'd probably use Parallel::ForkManager. There is no general rule to tell what is bottlenecking a process without knowing what it does in detail, or measuring it. A cheap way to get an idea, though, is to run a copy on a lightly loaded machine, and watch top. If your process is taking close to 100% CPU, then it is almost definitely CPU bound. If it is taking significantly less than 100% CPU, then something else is a problem at least some of the time. As a bonus, you also now know about how many copies of the process you can run until you run out of CPU. But you don't know what else is taking up time.	[reply]
Re^2: help me fork by mhearse (Chaplain) on Jul 20, 2004 at 13:37 UTC
Ahh. So fork is not the best answer. Better to make the program nicer. Will give it a try.	[reply]
Re^3: help me fork by mhearse (Chaplain) on Jul 20, 2004 at 13:47 UTC
Perhaps some additional info is warranted. The loghost in question is servicing messges from about 40 other machines. Some of the messges aren't being recorded properly(if at all). I assumed either a UDP issue or syslogd. A sniffer and netstat showed no UDP issues. So I am trying to determing how far I can push syslogd.	[reply]
Re^4: help me fork by mutated (Monk) on Jul 20, 2004 at 14:24 UTC
Ahhh ok in this case, forking off as many copies of your script as you can get away with to flood the syslog server with connections is the way to go, probably somewhere around 50-100 forked children is a good place to start, have the parent spawn off the children who each attempt one connection then die, that way the parent can launch another child and keep track of how many connection attempts where made and you can compare that to how many syslog entries exist. Be sure to run top in another terminal the first couple tests and make sure you arn't swaping out..if so reduce the number of children forked, if you have lots of memory free increase the number.. daN.	[reply]


Just another Perl shrine
	PerlMonks