Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

Re: help me fork

by mutated (Monk)
on Jul 20, 2004 at 13:22 UTC ( [id://375902]=note: print w/replies, xml ) Need Help??


in reply to help me fork

I'm going to say no, well not much anyways, unless you have a multi-processor system (If you do have a multiprocessor system forking off as many proccesses as you have processors would be a good thing). When you start forking all you are doing is wasting time switching between contexts, it doesn't create more processor time, it just shares the processor between more processes. I suspect what you really want to do is increase the priority of your process, see the manpage for the unix command nice. The big thing here is the bottle neck probably isn't your program it's syslog blocking while it tries to enter your log.


daN.

Replies are listed 'Best First'.
Re^2: help me fork
by tilly (Archbishop) on Jul 20, 2004 at 15:29 UTC
    This is bad advice. More precisely, it is advice that only applies to CPU-bound jobs. If your job spends a significant fraction of its time waiting for network or disk, then you're wrong.

    Increasing the priority of something that isn't waiting for CPU does you no good at all since they aren't having trouble there. Adding processes is good because it is not that much extra work to have 5 processes waiting for something external rather than 1. And while a disk is spinning around to where one process has its data, nothing stops it from reading or writing somewhere else for another process. Note that syslogd is an I/O bound process, so unless there is a global lock preventing two copies doing work at once, it will benefit from running multiple times. Of course too many waiting jobs runs into trouble as the disk is trying to do too many things.

    What the optimal threshold is for any particular job is highly dependent on your exact hardware and configuration. Test and benchmark it. The last time that I did this for an I/O bound job, I found that on the machine I tested for the job that I was doing, I got maximum throughput at 5 jobs. I therefore batched my report to run 5 at a time. For a database-bound job I found that I got the best results at 7 copies. Had I taken your advice in either case I would have only used 2 processes - and would have got less than half the throughput that I did.

      Thanks for the reply. I'm definitely learning something here. Just to clarify, you are suggesting to experiment to find the optimum number of simultaneous instances of my benchmark program. You mentioned running them in batch. Would this best be done using the afore mentioned Parallel::ForkManager module? I've been reading up on fork. I don't believe that the plain fork function has the ability to control the number of children, does it? Is there a general rule to tell whether a process is CPU or I/O bound?
        Yes, I'm suggesting that you need to experiment to find the optimum number of simultaneous instances. For running many jobs a fixed number of times, back then I used Run commands in parallel to run the processes. These days I'd probably use Parallel::ForkManager.

        There is no general rule to tell what is bottlenecking a process without knowing what it does in detail, or measuring it. A cheap way to get an idea, though, is to run a copy on a lightly loaded machine, and watch top. If your process is taking close to 100% CPU, then it is almost definitely CPU bound. If it is taking significantly less than 100% CPU, then something else is a problem at least some of the time. As a bonus, you also now know about how many copies of the process you can run until you run out of CPU. But you don't know what else is taking up time.

Re^2: help me fork
by mhearse (Chaplain) on Jul 20, 2004 at 13:37 UTC
    Ahh. So fork is not the best answer. Better to make the program nicer. Will give it a try.
      Perhaps some additional info is warranted. The loghost in question is servicing messges from about 40 other machines. Some of the messges aren't being recorded properly(if at all). I assumed either a UDP issue or syslogd. A sniffer and netstat showed no UDP issues. So I am trying to determing how far I can push syslogd.
        Ahhh ok in this case, forking off as many copies of your script as you can get away with to flood the syslog server with connections is the way to go, probably somewhere around 50-100 forked children is a good place to start, have the parent spawn off the children who each attempt one connection then die, that way the parent can launch another child and keep track of how many connection attempts where made and you can compare that to how many syslog entries exist. Be sure to run top in another terminal the first couple tests and make sure you arn't swaping out..if so reduce the number of children forked, if you have lots of memory free increase the number..


        daN.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://375902]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having an uproarious good time at the Monastery: (4)
As of 2024-04-19 13:41 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found