Just another Perl shrine | |
PerlMonks |
Task distribution project Qby feloniousMonk (Pilgrim) |
on Sep 20, 2004 at 20:24 UTC ( [id://392483]=perlquestion: print w/replies, xml ) | Need Help?? |
feloniousMonk has asked for the wisdom of the Perl Monks concerning the following question:
Hi folks I have been given a task which involves maximizing my computing resources using minimal overhead. For this task, I have a single text file which is to be split into many (up to thousands) of equal-sized chunks, and a single command to be run on them. The command is a small perl script which analyzes the text file's data, and outputs some good stuff into another file (which is an argument to the script) Now, up until now I have passed the jobs off to Sun Grid Engine as a job array, and life has been good. In this case I cannot do this, but must build my own job manager instead. Here's why: SGE will not tell me when a job is complete, whether it worked, failed, etc. This in and of itself is not a deal-breaker because I already have handlers built into my SGE-calling code. SGE overhead - I don't think this should be a consideration, but my boss does not want it either way. The main point here is that the system must be complete and can be run on networks which do not have SGE or any similar system. That's the biggie. The quick overview is I have a big file and a script to run it through. I need to make a handler script which breaks it up, throws the individual jobs at a bunch of big servers, gets told when it's done then cats all the output files into one big result file. My question is this - without reinventing the wheel, does anyone have advice to lead me in the right direction? The closest I've come to a starting point is using RPC calls, not sure if this is the best idea or not. Also, a little more about the system - there will be a main script which will be started on a compute server. Each compute server will be given N jobs at one time, size of N depending on the size of the input file chunks. It is possible but maybe sub-optimal to start one server program on each compute server per job it can handle at any given time. Maybe this can even be done without breaking it into a client/server architecture though I don't yet see how. Sorry for the wordy description, I will be happy to clarify anything that I can.
--
Thanks,
feloniousMonk
Back to
Seekers of Perl Wisdom
|
|