Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

Re^2: help me fork

by tilly (Archbishop)
on Jul 20, 2004 at 15:29 UTC ( [id://375974]=note: print w/replies, xml ) Need Help??


in reply to Re: help me fork
in thread help me fork

This is bad advice. More precisely, it is advice that only applies to CPU-bound jobs. If your job spends a significant fraction of its time waiting for network or disk, then you're wrong.

Increasing the priority of something that isn't waiting for CPU does you no good at all since they aren't having trouble there. Adding processes is good because it is not that much extra work to have 5 processes waiting for something external rather than 1. And while a disk is spinning around to where one process has its data, nothing stops it from reading or writing somewhere else for another process. Note that syslogd is an I/O bound process, so unless there is a global lock preventing two copies doing work at once, it will benefit from running multiple times. Of course too many waiting jobs runs into trouble as the disk is trying to do too many things.

What the optimal threshold is for any particular job is highly dependent on your exact hardware and configuration. Test and benchmark it. The last time that I did this for an I/O bound job, I found that on the machine I tested for the job that I was doing, I got maximum throughput at 5 jobs. I therefore batched my report to run 5 at a time. For a database-bound job I found that I got the best results at 7 copies. Had I taken your advice in either case I would have only used 2 processes - and would have got less than half the throughput that I did.

Replies are listed 'Best First'.
Re^3: help me fork
by mhearse (Chaplain) on Jul 20, 2004 at 15:50 UTC
    Thanks for the reply. I'm definitely learning something here. Just to clarify, you are suggesting to experiment to find the optimum number of simultaneous instances of my benchmark program. You mentioned running them in batch. Would this best be done using the afore mentioned Parallel::ForkManager module? I've been reading up on fork. I don't believe that the plain fork function has the ability to control the number of children, does it? Is there a general rule to tell whether a process is CPU or I/O bound?
      Yes, I'm suggesting that you need to experiment to find the optimum number of simultaneous instances. For running many jobs a fixed number of times, back then I used Run commands in parallel to run the processes. These days I'd probably use Parallel::ForkManager.

      There is no general rule to tell what is bottlenecking a process without knowing what it does in detail, or measuring it. A cheap way to get an idea, though, is to run a copy on a lightly loaded machine, and watch top. If your process is taking close to 100% CPU, then it is almost definitely CPU bound. If it is taking significantly less than 100% CPU, then something else is a problem at least some of the time. As a bonus, you also now know about how many copies of the process you can run until you run out of CPU. But you don't know what else is taking up time.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://375974]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others surveying the Monastery: (2)
As of 2024-04-25 22:44 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found