help me fork

mhearse has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: help me fork by Zaxo (Archbishop) on Jul 20, 2004 at 15:33 UTC
If you are runnning a central log server, you can connect several machines through udp on port 514 as well as any other sockets you have instructed syslogd to listen on. You don't need to mess with the call to logger if you know what ports or devices you can use. You can see what your syslogd is listening to with the system call lsof -p `pidof syslogd`. I think that some monks' warnings about fork not helping this are bogus. This is an I/O heavy application, and any process talking over a port or to a disk file spends a lot of time sleeping, waining for the IO system to respond. Having many processes active uses time the sleepers have relenquished. I think your question is really about how to use fork. Here is a snippet which will open a connection to udp port 514 on host "logserver", and then spawn fifty processes to all talk at once. Untested, I don't have remote logging set up here. `use IO::Socket::INET; my $log = IO::Socket::INET->new( PeerAddr => 'logserver', PeerPort => 514, Proto => 'udp' ); die unless $log->connected();` [download] That provides an IO::Socket handle to the syslogd port on the server. The handle will be duplicated by all the child processes we spawn. You spoke of wanting to beat 3000 messages per minute. run under `time` to check. We'll fork 50 children to each send 60 messages. `my %kid; for (1..50) { my $pid = fork; $kid{$pid} = undef, next if $pid; next if not defined $pid; # in child undef %kid; close STDERR; close STDOUT; close STDIN; for (1..60) { $log->send("<DEBUG> Child $0: Message #$_\n"); } exit 0; } # No Zombies! delete $kid{wait()} while %kid;` [download] The socket code will probably need twiddling, but the fork related stuff is what you wanted. In particular, it may be desirable to move the socket constructor inside the loop so each child has its own connection to the server. The %kid hash is how the parent process keeps track of what child processes there are. Using wait at the end reaps child exits to prevent zombies from forming, and also causes the parent to hang around until they are done. That makes the entire operation easy to time. After Compline, Zaxo	[reply] [d/l] [select]
Re^2: help me fork by mhearse (Chaplain) on Jul 20, 2004 at 17:00 UTC
Thanks. I'm going to vary the number of children vs. log messages per child. To see what combination produces 3000 messges the fastest.	[reply]
Re: help me fork by Joost (Canon) on Jul 20, 2004 at 13:14 UTC
Yes. Probably. What's your question? Maybe you want Parallel::ForkManager and Sys::Syslog? "What should it profit a man, if he should win a flame war, yet lose his cool?"	[reply]
A reply falls below the community's threshold of quality. You may see it by logging in.
Re: help me fork by mutated (Monk) on Jul 20, 2004 at 13:22 UTC
I'm going to say no, well not much anyways, unless you have a multi-processor system (If you do have a multiprocessor system forking off as many proccesses as you have processors would be a good thing). When you start forking all you are doing is wasting time switching between contexts, it doesn't create more processor time, it just shares the processor between more processes. I suspect what you really want to do is increase the priority of your process, see the manpage for the unix command nice. The big thing here is the bottle neck probably isn't your program it's syslog blocking while it tries to enter your log. daN.	[reply]
Re^2: help me fork by tilly (Archbishop) on Jul 20, 2004 at 15:29 UTC
This is bad advice. More precisely, it is advice that only applies to CPU-bound jobs. If your job spends a significant fraction of its time waiting for network or disk, then you're wrong. Increasing the priority of something that isn't waiting for CPU does you no good at all since they aren't having trouble there. Adding processes is good because it is not that much extra work to have 5 processes waiting for something external rather than 1. And while a disk is spinning around to where one process has its data, nothing stops it from reading or writing somewhere else for another process. Note that syslogd is an I/O bound process, so unless there is a global lock preventing two copies doing work at once, it will benefit from running multiple times. Of course too many waiting jobs runs into trouble as the disk is trying to do too many things. What the optimal threshold is for any particular job is highly dependent on your exact hardware and configuration. Test and benchmark it. The last time that I did this for an I/O bound job, I found that on the machine I tested for the job that I was doing, I got maximum throughput at 5 jobs. I therefore batched my report to run 5 at a time. For a database-bound job I found that I got the best results at 7 copies. Had I taken your advice in either case I would have only used 2 processes - and would have got less than half the throughput that I did.	[reply]
Re^3: help me fork by mhearse (Chaplain) on Jul 20, 2004 at 15:50 UTC
Thanks for the reply. I'm definitely learning something here. Just to clarify, you are suggesting to experiment to find the optimum number of simultaneous instances of my benchmark program. You mentioned running them in batch. Would this best be done using the afore mentioned Parallel::ForkManager module? I've been reading up on fork. I don't believe that the plain fork function has the ability to control the number of children, does it? Is there a general rule to tell whether a process is CPU or I/O bound?	[reply]
Re^4: help me fork by tilly (Archbishop) on Jul 20, 2004 at 15:57 UTC
Re^2: help me fork by mhearse (Chaplain) on Jul 20, 2004 at 13:37 UTC
Ahh. So fork is not the best answer. Better to make the program nicer. Will give it a try.	[reply]
Re^3: help me fork by mhearse (Chaplain) on Jul 20, 2004 at 13:47 UTC
Perhaps some additional info is warranted. The loghost in question is servicing messges from about 40 other machines. Some of the messges aren't being recorded properly(if at all). I assumed either a UDP issue or syslogd. A sniffer and netstat showed no UDP issues. So I am trying to determing how far I can push syslogd.	[reply]
Re^4: help me fork by mutated (Monk) on Jul 20, 2004 at 14:24 UTC
Re: help me fork by pbeckingham (Parson) on Jul 20, 2004 at 13:24 UTC
I would guess that using Perl as a load driver for `syslogd` is more likely to beat up your machine because of the forking Perl.	[reply] [d/l]


Think about Loose Coupling
	PerlMonks