Re: General perl question. Multiple servers.
by shmem (Chancellor) on Oct 06, 2007 at 15:01 UTC
|
I'd set up a syslog server and have each process send a UDP packet to this server
after process completion.
--shmem
_($_=" "x(1<<5)."?\n".q·/)Oo. G°\ /
/\_¯/(q /
---------------------------- \__(m.====·.(_("always off the crowd"))."·
");sub _{s./.($e="'Itrs `mnsgdq Gdbj O`qkdq")=~y/"-y/#-z/;$e.e && print}
| [reply] |
|
A syslog server is a very good (and, IME, very underused) solution.
Alternately, if that's not sexy enough to get management buy-in, you could instead set the processes up to all log to a central database, but that would mostly be just pointless overhead unless you're using a database already (and may still be pointless overhead even if you are).
| [reply] |
|
++ Much better than my idea below, but there would need to be a reliable way to identify the cases where any of the 150 processes fail before they get to the point of sending their UDP packet to the log server. Not hard to handle, just easy to forget...
update: On second thought, if the log data from each host is anything more than a single summary report printed at the end of each job, I would still kinda prefer my approach. If the jobs are printing progress reports at intervals, the entries submitted to a central syslog server will tend to be interleaved, and will need to be sorted out. Not a big deal, obviously, but it might be handier to have the stuff "pre-sorted" by harvesting from each machine.
| [reply] |
|
there would need to be a reliable way to identify the cases where any of the 150 processes fail
A 'job started' message could be sent by a wrapper which cares about the process and reports
its exit status.
the entries submitted to a central syslog server will tend to be interleaved
syslog is configurable, and one could send the the log messages to different files based on
level/facility and host. Anyways, the log line is marked with the host sending the log message,
so sorting things out is as easy as grepping the log file for a host.
--shmem
_($_=" "x(1<<5)."?\n".q·/)Oo. G°\ /
/\_¯/(q /
---------------------------- \__(m.====·.(_("always off the crowd"))."·
");sub _{s./.($e="'Itrs `mnsgdq Gdbj O`qkdq")=~y/"-y/#-z/;$e.e && print}
| [reply] |
|
It is interesting, the different backgrounds from which we all come. I'm predominantly used to applying solutions to assets given, while other people come from backgrounds where adding a server here or there is considered trivial.
It's good to get both perspectives.
| [reply] |
Re: General perl question. Multiple servers.
by snopal (Pilgrim) on Oct 06, 2007 at 14:46 UTC
|
| [reply] |
Re: General perl question. Multiple servers.
by graff (Chancellor) on Oct 06, 2007 at 15:04 UTC
|
Probably the easiest would be a cron job on one machine that takes a list of the machines that generate logs, harvests the log files from all of them into a daily directory, and then parses each log file in turn to do statistics and report details on "outliers".
In effect, the perl script does mechanically what you are now doing manually (you can be replaced by a perl script ;). Of course, that assumes that the 150 machines are all doing the same thing, and their results are all stored using the same path/file.name on each machine. Either that, or else the list of machines to scan includes all the details needed to find the log file for each one.
For doing the harvest, there's nothing inherently wrong about just running an appropriate command in back-ticks like my $log = `ssh $host cat /log/path/my_process.log` (this assumes you have public-key authentication in place, so the userid running this won't need to supply a password for each connection). If the overhead of launching a shell 150 times bothers you, you could do it like this:
use strict;
use POSIX;
my $today = strftime( "%Y%m%d", localtime );
mkdir "/path/harvest/$today";
chdir "/path/harvest/$today" or die $!;
my @remote_machine_list = ... # (fill in the ...)
my $shell_pid = open my $shell, "|-", "/bin/sh" or die $!;
print $shell "cd /path/harvest/$today\n";
print $shell "ssh $_ cat /log/path/my_process.log > $_.log 2> harvest.
+errlog ||".
" echo $_ failed 2> harvest.errlog\n"
for ( @remote_machine_list );
print $shell "exit\n";
waitpid $shell_pid, 0;
(updates: added check for success from chdir call, and added first print statement to chdir in the subshell.)
You might need to add stuff to that, like setting SIG_ALARM in case ssh hangs on a given host. On each iteration, if the "ssh $_ ..." works, its output is stored locally in "$_.log", and any stderr output is appended to the local file "harvest.errlog". But if the ssh fails, a line about that is appended to "harvest.errlog" as well.
But you also have a variety of CPAN modules in the Net::SSH domain that you might find preferable. | [reply] [d/l] [select] |
|
| [reply] |
Re: General perl question. Multiple servers.
by mwah (Hermit) on Oct 06, 2007 at 16:05 UTC
|
dbmathis: I have a group of about 150 linux application
servers that a process runs on nightly and then a SUCCESS gets written
to a logfile of each of the servers when the process completes.
Currently I have to log into each server via ssh and grep each
log to see if the process completed.
YMMV, but I had (and have) to deal with a similar problem
in a "computational chemistry" environment. The number of servers
or nodes is about one half of yours.
What I learned from all that: "keep it dead simple" try to
get it installed OOTB -if possible.
My current solution:
1. programs & logging
- One of the (older) boxes poses as server and holds the
node cluster in a subnet (a private one in my case)
- The server exposes (NFS,SMB possible) its /usr/local/bin (ro-mode) and
its /srv/cluster (rw-mode) to the subnet,
- The nodes load their applications from the central mounted
/usr/local/bin and write logs with date and ip
(in filenames) into seperate files in /srv/cluster
2. job overview
- The server has some perl scripts for job overview,
if required, the number and respective
ip's of running nodes are found by "nmapping" the subnet:
...
# $addr is the actual subnet, e.g. "192.168.1.0"
...
my $output = qx{nmap -sP ${addr}/24};
my @nodes= $output =~ /(?<=\s)c\w+\b/g;
This (nmap -sP) will run very fast (at least here, from
a non-root account) and may provide a
"real time" info on running nodes per html page, eg.:
...
print header('text/html');
print h1('Local Network: '. $addr . '/24');
print map "$_ appears to be up<br />", @hosts
...
The found nodes might then be rsh'ed (if its a private
subnet, you won't be killed for using rsh/rexec then)
Pseudo:
...
my ($exe, $cmd) = ('/usr/bin/rsh', 'ps -fl r -u username');
my $cnt = 0;
for my $node ( sort @nodes ) {
my @res = grep !(/$cmd/ || /STIME/), split /[\n\r]+/, qx{$exe $node
+ $cmd};
my $nproc = scalar @res; # how many processes
if( $nproc ) {
print map
"Do " . "some ". "formatting of " . "ps -fl output here!",
@res
}
...
++$cnt
...
In the end, you'll have a browser-interface to the
running processes (build a nice html table in the "map"
above) and a central directory full of log files, which
might even be exported (smb) to windows machines for
coworker preferring the explorer ;-)
The only "complication" (additional work per node) would
be "installing and enabling the nfs client".
my €0.02
regards
mwa
| [reply] [d/l] [select] |
Re: General perl question. Multiple servers.
by perlfan (Vicar) on Oct 06, 2007 at 15:27 UTC
|
Pushing some sort of confirmation out to a single master would be the easiest thing to do (using ssh passwordless auth), but this would actually be neat situation to implement some sort of distributed confirmation algorithm that doesn't have a single point of failure - that being the connection between your master and any of the servers you'd like to keep tabs on.
If you have some time to think about the solution and a lot more time to code it, POE might be the answer, but in reality a simple master daemon listener solution would not take long to craft from the many Perl daemon examples out there.
You could use a simple scheme where there is the master listener waiting for "DONE_SUCCESS" from all respective servers. On hearing "DONE_SUCCESS", the master daemon would send an ACK back to the server in question (requires a listener on that end, too). The server trying to check in would continue to send "DONE_SUCCESS" messages until it finally hears the "ACK" from the master.
If you're interested in a solution like that, let me know and I'll go over it in more detail. | [reply] |
|
| [reply] |
Re: General perl question. Multiple servers.
by jethro (Monsignor) on Oct 07, 2007 at 01:49 UTC
|
If you want reliable, then don't try to programm the network code yourself, use established methods.
So the syslog suggestion from shmem is the most simple, most economic solution you can find if your applications already use syslog for logging or can be persuaded to do so. Syslog takes care of the network transmission and all you have to do is to parse the local log (the code for that would easily fit in one line).
If that is not possible, a (perl-)script that uses ssh to poll all the servers is IMHO already a very reliable solution (use ssh parameter "-o ConnectTimeout=4", so that ssh doesn't wait so long for offline servers) and be sure to check for success of the ssh.
| [reply] |
Re: General perl question. Multiple servers.
by casiano (Pilgrim) on Oct 07, 2007 at 12:12 UTC
|
| [reply] |
Re: General perl question. Multiple servers.
by mattr (Curate) on Oct 09, 2007 at 04:56 UTC
|
Syslog seems to be a good idea. It doesn't mean set up a new hardware box, just a daemon. Both the following links note you want to set your clocks with NTP in case you don't yet. Note that UDP mentioned above does not guarantee delivery especially if your servers are distant.
oreilly.com syslog.org
| [reply] |