Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

Re^3: Help with multiple forks

by sundialsvc4 (Abbot)
on May 31, 2012 at 14:50 UTC ( [id://973539]=note: print w/replies, xml ) Need Help??


in reply to Re^2: Help with multiple forks
in thread Help with multiple forks

Yes, “learn about pipes” because this would make your job a helluva lot simpler.   Start a pool of child-processes that read from a pipe and do the work that has been given to them by means of that pipe.   (When the pipe is closed by the writer, the children’s read-requests fail and when this happens they terminate themselves.)

Likewise, instead of “starting” the second-stage processes when the first stage has finished, have the first-stage processes write messages to a second pipe that is listened-to by the second-stage processes which are built using the same design.   After the first-stage processes consume their work and die-off, the second-stage processes in turn consume their work and die, and so on, until the parent finally realizes that all of its children have died (as expected) and it then terminates.

Now, all of the processes (regardless of their role) do their initialization and termination only once, and perform their jobs as quickly as they are able, and the pipes take up the slack.   You tune the behavior of the system for maximum throughput by tweaking the number of processes that you create, and they perform work at that constant rate no matter how full or how empty the pipes may be.

Think:   production line.


Edit:   Responding if I may to BrowserUK’s not-so-Anonymous reply (and his exceedingly discourteous but not-unexpected downvote) to the above ... kindly notice that most multiprogrammed systems are and always have been built around the notion of a limited (but variable) number of persistent worker processes that produce and consume work from a flexible queue of some kind.   Even in the earliest days of computing, when hulking IBM mainframe computers barely had enough horsepower to get out of their own way, their batch-job processing engines and interactive systems (e.g. CICS) had and still do have this essential architecture.   The reason why is quite simple:   you can tune it readily (just by adjusting the number of workers and/or their handling of the queues), and it performs at a predictable sustained rate without over-committing itself.   The queues absorb the slack.   Such an arrangement naturally conforms itself to, for example, computing clusters, and it gracefully supports the adding and removing and re-deployment of computing resources.

“Over-committing” a system produces performance degradation that becomes exponential after a period of time in which it is linear, a harrowing phenomenon called (politely) “hitting the wall.”   The curve has an elbow-shaped bend which goes straight up (to hell).   For instance, I once worked at a school which needed to run computationally-expensive engineering packages on a too-small machine.   If one instance was running, it took about 2 minutes; with five, about 4. But with seven, each one took about 18 minutes and it went downhill from there ... fast.   A little math will tell you that the right way to get seven jobs done in 6 minutes (on average) is to allow no more than five to run at one time.   It worked, much to the disappointment of the IBM hardware salesman.   The rest sit in a queue, costing nothing for the entire time they sit there not-yet-started.   Likewise, a queue-based architecture will consistently deliver x results-per-minute at a sustained rate even if there are larger-y pieces of work to be performed.   A thread (or process) is not a unit-of-work.

Replies are listed 'Best First'.
Re^4: Help with multiple forks
by Anonymous Monk on May 31, 2012 at 15:23 UTC
    would make your job a helluva lot simpler

    As usual, without any code to support the view ...

Re^4: Help with multiple forks
by Anonymous Monk on May 31, 2012 at 18:13 UTC

    But that's not what I want. I need the files created by the outer loop (the "$val1.txt" ones) for later work (next week maybe, when I get around to programming it), and I need to know exactly which is which.

    What I don't necessarily need is the output from the 2nd step. I would rather have just 2 files, but I don't know yet how to do it. I used to simply open two files and print to them accordingly, but the input was all jumbled, and I was told to use pipes instead. And here I am up to my knees in parents and children again :-/

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://973539]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others cooling their heels in the Monastery: (5)
As of 2024-04-24 20:10 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found