Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

General perl question. Multiple servers.

by dbmathis (Scribe)
on Oct 06, 2007 at 14:04 UTC ( #643089=perlquestion: print w/replies, xml ) Need Help??

dbmathis has asked for the wisdom of the Perl Monks concerning the following question:

Hi All,

I have a group of about 150 linux application servers that a process runs on nightly and then a SUCCESS gets written to a logfile of each of the servers when the process completes.

Currently I have to log into each server via ssh and grep each log to see if the process completed.

Using perl, what would be that most reliable and efficient whay to automate and acomplish the same thing? Should I have a process that runs on one machine and then reaches out to each server or should I have a process running on each app server? How would either of these be accomplished?

I was think that there must be some kind of standard process that people follow for something like this, so I am just looking for general advice.

Best Regards

After all this is over, all that will really have mattered is how we treated each other.
  • Comment on General perl question. Multiple servers.

Replies are listed 'Best First'.
Re: General perl question. Multiple servers.
by shmem (Chancellor) on Oct 06, 2007 at 15:01 UTC
    I'd set up a syslog server and have each process send a UDP packet to this server after process completion.

    --shmem

    _($_=" "x(1<<5)."?\n".q·/)Oo.  G°\        /
                                  /\_¯/(q    /
    ----------------------------  \__(m.====·.(_("always off the crowd"))."·
    ");sub _{s./.($e="'Itrs `mnsgdq Gdbj O`qkdq")=~y/"-y/#-z/;$e.e && print}
      A syslog server is a very good (and, IME, very underused) solution.

      Alternately, if that's not sexy enough to get management buy-in, you could instead set the processes up to all log to a central database, but that would mostly be just pointless overhead unless you're using a database already (and may still be pointless overhead even if you are).

      ++ Much better than my idea below, but there would need to be a reliable way to identify the cases where any of the 150 processes fail before they get to the point of sending their UDP packet to the log server. Not hard to handle, just easy to forget...

      update: On second thought, if the log data from each host is anything more than a single summary report printed at the end of each job, I would still kinda prefer my approach. If the jobs are printing progress reports at intervals, the entries submitted to a central syslog server will tend to be interleaved, and will need to be sorted out. Not a big deal, obviously, but it might be handier to have the stuff "pre-sorted" by harvesting from each machine.

        there would need to be a reliable way to identify the cases where any of the 150 processes fail

        A 'job started' message could be sent by a wrapper which cares about the process and reports its exit status.

        the entries submitted to a central syslog server will tend to be interleaved

        syslog is configurable, and one could send the the log messages to different files based on level/facility and host. Anyways, the log line is marked with the host sending the log message, so sorting things out is as easy as grepping the log file for a host.

        --shmem

        _($_=" "x(1<<5)."?\n".q·/)Oo.  G°\        /
                                      /\_¯/(q    /
        ----------------------------  \__(m.====·.(_("always off the crowd"))."·
        ");sub _{s./.($e="'Itrs `mnsgdq Gdbj O`qkdq")=~y/"-y/#-z/;$e.e && print}

      It is interesting, the different backgrounds from which we all come. I'm predominantly used to applying solutions to assets given, while other people come from backgrounds where adding a server here or there is considered trivial.

      It's good to get both perspectives.

Re: General perl question. Multiple servers.
by snopal (Pilgrim) on Oct 06, 2007 at 14:46 UTC

    I recommend these options:

    • P.O.E. - invoke remotely with clear responses
    • Local net delivery - cron response is sent to central machine via scp/rsync
    • NFS - copy response to central common directory space
    • MTA - send responses to a common mail address
    • ... limited by your imagination and resources

Re: General perl question. Multiple servers.
by graff (Chancellor) on Oct 06, 2007 at 15:04 UTC
    Probably the easiest would be a cron job on one machine that takes a list of the machines that generate logs, harvests the log files from all of them into a daily directory, and then parses each log file in turn to do statistics and report details on "outliers".

    In effect, the perl script does mechanically what you are now doing manually (you can be replaced by a perl script ;). Of course, that assumes that the 150 machines are all doing the same thing, and their results are all stored using the same path/file.name on each machine. Either that, or else the list of machines to scan includes all the details needed to find the log file for each one.

    For doing the harvest, there's nothing inherently wrong about just running an appropriate command in back-ticks like my $log = `ssh $host cat /log/path/my_process.log` (this assumes you have public-key authentication in place, so the userid running this won't need to supply a password for each connection). If the overhead of launching a shell 150 times bothers you, you could do it like this:

    use strict; use POSIX; my $today = strftime( "%Y%m%d", localtime ); mkdir "/path/harvest/$today"; chdir "/path/harvest/$today" or die $!; my @remote_machine_list = ... # (fill in the ...) my $shell_pid = open my $shell, "|-", "/bin/sh" or die $!; print $shell "cd /path/harvest/$today\n"; print $shell "ssh $_ cat /log/path/my_process.log > $_.log 2> harvest. +errlog ||". " echo $_ failed 2> harvest.errlog\n" for ( @remote_machine_list ); print $shell "exit\n"; waitpid $shell_pid, 0;
    (updates: added check for success from chdir call, and added first print statement to chdir in the subshell.)

    You might need to add stuff to that, like setting SIG_ALARM in case ssh hangs on a given host. On each iteration, if the "ssh $_ ..." works, its output is stored locally in "$_.log", and any stderr output is appended to the local file "harvest.errlog". But if the ssh fails, a line about that is appended to "harvest.errlog" as well.

    But you also have a variety of CPAN modules in the Net::SSH domain that you might find preferable.

      This may work, I do have public-key authentication in place already. Thanks for the information. I will let you know what I end up doing.

      After all this is over, all that will really have mattered is how we treated each other.
Re: General perl question. Multiple servers.
by mwah (Hermit) on Oct 06, 2007 at 16:05 UTC
    dbmathisI have a group of about 150 linux application
    servers that a process runs on nightly and then a SUCCESS gets written
    to a logfile of each of the servers when the process completes.
    Currently I have to log into each server via ssh and grep each
    log to see if the process completed.



    YMMV, but I had (and have) to deal with a similar problem
    in a "computational chemistry" environment. The number of servers
    or nodes is about one half of yours.

    What I learned from all that: "keep it dead simple" try to get it installed OOTB -if possible.
    My current solution:

    1. programs & logging
    - One of the (older) boxes poses as server and holds the
       node cluster in a subnet (a private one in my case)
    - The server exposes (NFS,SMB possible) its /usr/local/bin (ro-mode) and
       its /srv/cluster (rw-mode) to the subnet,
    - The nodes load their applications from the central mounted
       /usr/local/bin and write logs with date and ip
       (in filenames) into seperate files in /srv/cluster

    2. job overview
    - The server has some perl scripts for job overview,
       if required, the number and respective
       ip's of running nodes are found by "nmapping" the subnet:
    ... # $addr is the actual subnet, e.g. "192.168.1.0" ... my $output = qx{nmap -sP ${addr}/24}; my @nodes= $output =~ /(?<=\s)c\w+\b/g;
    This (nmap -sP) will run very fast (at least here, from
    a non-root account) and may provide a
    "real time" info on running nodes per html page, eg.:
    ... print header('text/html'); print h1('Local Network: '. $addr . '/24'); print map "$_ appears to be up<br />", @hosts ...
    The found nodes might then be rsh'ed (if its a private
    subnet, you won't be killed for using rsh/rexec then)
    Pseudo:
    ... my ($exe, $cmd) = ('/usr/bin/rsh', 'ps -fl r -u username'); my $cnt = 0; for my $node ( sort @nodes ) { my @res = grep !(/$cmd/ || /STIME/), split /[\n\r]+/, qx{$exe $node + $cmd}; my $nproc = scalar @res; # how many processes if( $nproc ) { print map "Do " . "some ". "formatting of " . "ps -fl output here!", @res } ... ++$cnt ...
    In the end, you'll have a browser-interface to the
    running processes (build a nice html table in the "map"
    above) and a central directory full of log files, which
    might even be exported (smb) to windows machines for
    coworker preferring the explorer ;-)

    The only "complication" (additional work per node) would
    be "installing and enabling the nfs client".
    my €0.02

    regards

    mwa
Re: General perl question. Multiple servers.
by perlfan (Vicar) on Oct 06, 2007 at 15:27 UTC
    Pushing some sort of confirmation out to a single master would be the easiest thing to do (using ssh passwordless auth), but this would actually be neat situation to implement some sort of distributed confirmation algorithm that doesn't have a single point of failure - that being the connection between your master and any of the servers you'd like to keep tabs on.

    If you have some time to think about the solution and a lot more time to code it, POE might be the answer, but in reality a simple master daemon listener solution would not take long to craft from the many Perl daemon examples out there.

    You could use a simple scheme where there is the master listener waiting for "DONE_SUCCESS" from all respective servers. On hearing "DONE_SUCCESS", the master daemon would send an ACK back to the server in question (requires a listener on that end, too). The server trying to check in would continue to send "DONE_SUCCESS" messages until it finally hears the "ACK" from the master.

    If you're interested in a solution like that, let me know and I'll go over it in more detail.
      Hi perlfan,

      The master daemon idea is somewhat attractive because I could apply this same technique to several other projects. I would be delighted to hear more details.

      Best Regards

      After all this is over, all that will really have mattered is how we treated each other.
Re: General perl question. Multiple servers.
by jethro (Monsignor) on Oct 07, 2007 at 01:49 UTC
    If you want reliable, then don't try to programm the network code yourself, use established methods.

    So the syslog suggestion from shmem is the most simple, most economic solution you can find if your applications already use syslog for logging or can be persuaded to do so. Syslog takes care of the network transmission and all you have to do is to parse the local log (the code for that would easily fit in one line).

    If that is not possible, a (perl-)script that uses ssh to poll all the servers is IMHO already a very reliable solution (use ssh parameter "-o ConnectTimeout=4", so that ssh doesn't wait so long for offline servers) and be sure to check for success of the ssh.
Re: General perl question. Multiple servers.
by casiano (Pilgrim) on Oct 07, 2007 at 12:12 UTC
    Probaly the module GRID::Machine can help if you decide to make automatic what you are now doing by hand:

    "I have to log into each server via ssh and grep each log to see if the process completed"

Re: General perl question. Multiple servers.
by mattr (Curate) on Oct 09, 2007 at 04:56 UTC
    Syslog seems to be a good idea. It doesn't mean set up a new hardware box, just a daemon. Both the following links note you want to set your clocks with NTP in case you don't yet. Note that UDP mentioned above does not guarantee delivery especially if your servers are distant.

    oreilly.com syslog.org

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://643089]
Approved by Corion
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (5)
As of 2020-12-04 12:05 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    How often do you use taint mode?





    Results (58 votes). Check out past polls.

    Notices?