Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

HTTP Daemonology

by jlongino (Parson)
on Aug 08, 2001 at 23:35 UTC ( [id://103189]=perlquestion: print w/replies, xml ) Need Help??

jlongino has asked for the wisdom of the Perl Monks concerning the following question:

We have a Netscape Enterprise Server 3.6 running on a Solaris 7 box. It becomes progressively more CPU intensive the longer it runs. The Web Admin is planning on upgrading to a more current version within the next couple of months and does not want to expend any of Web Services energy to diagnose/solve the problem (hoping that the upgrade will magically solve everything). So, they want me to provide a short-term solution.

First, let me say that I'm not happy with the aesthetics of the situation or solutions I'm considering. I feel that Web Services should determine what the problem is and fix it. No analysis has been done to determine what the problem is or whether upgrading will really make a difference. Second, I'm just looking for suggestions/feedback, not code, nor Web Server tuning tips.

Currently, the Web Admin is notified by irrate users or determines by periodically monitoring the ns-httpd daemon via top that the server has bogged down. He then stops and restarts the daemon manually using stop and start commands. I know this sucks but at least I got him to stop power-cycling the Sun box whenever response times became sluggish.

I did some preliminary searches on CPAN and PerlDoc for perl modules that might give process statistics directly with no success. My solution is to kick off a perl script (via cron) every 10 minutes or so, scarf CPU utilization percentages derived from top 4 or 5 times (pausing a second or two between polls), average them, make a determination as to whether or not to perform a stop/start on the daemon, and then either exit or restart the server.

The Web Server fetches static pages for the most part and does very little, if any transaction processing.

Thanks in advance for any suggestions.

Replies are listed 'Best First'.
OT: re: Netscape Enterprise Server 3.6
by gregor42 (Parson) on Aug 09, 2001 at 02:49 UTC

    OFF TOPIC

    Netscape Enterprise Server 3.6 is no logner a supported product. It hasn't been for almost 2 years now.

    The memory leak is a well documented problem. It first reared it's ugly head in 3.4. An attempt was made to fix it in 3.5, which was much worse. 3.6 introduced dxwdog which is supposed to be a watchdog script that looks for this very problem. I suggest looking to that & tweaking.

    As a web software developer and engineer, my professional advice to you is to move to a supported platform. Even a move to Apache at this time would be far better in terms of performance and stability. The costs associated with that are labor. There's no GUI to configure Apache unless you use something like Tk/Apache a.k.a. Mohawk.

    I suggest this since you mentioned that you are only serving static pages for the most part. If you were using LiveWire I'd suggest looking at Resin.

    I know that you're trying to solve what appears to be a simple problem. You are not the first one to attempt this.

    Migrate/Update!!



    Wait! This isn't a Parachute, this is a Backpack!
      I appreciate your input and I'll certainly look into dxwdog like you suggested as it sounds promising.

      However, as to being "Off Topic", I thought that it was clearly stated that I was looking for a perl-based solution (a module perhaps) that would facilitate monitoring CPU utilization of a given process. Apparently it wasn't as clearly stated as I thought, for which I apologize.

      As for upgrading or migrating to Apache, I've already made both those recommendations but I'm not in a position to demand them.

        I don't think gregor42 meant that your question was off topic... As I read it, the answer was marked as off-topic. (i.e. not really perl related) In the same spirit my answer below is probably off topic as well....

        In my experience, scripts that automatically diagnose and "fix" a problem (ala bouncing your webserver) are more trouble than they are worth. I'd recommend running a full fledged monitoring program that alerts you whenever a problem occurs. I've had good success with Big Brother but am seriously considering switching over to netsaint. Both systems religiously monitor everything from memory usage, to internet connectivity, to database connectability (sp?). You can also write your own "plug-in" scripts to monitor anything you want. Some have even used this feature to send out stock market alerts, or keep an eye out for cheap airline tickets.

        While the last two uses are rather esoteric, having a monitoring system that is easily customizable, is crucial to running a high quality internet service.

        -Blake

Re: HTTP Daemonology
by Agermain (Scribe) on Aug 09, 2001 at 00:51 UTC
    Well, if you don't have access to process statistics, then maybe you could go in through the weblogs? Perhaps you could have a script, run by cron every ten minutes or so, to check up on the weblogs and restart it if it's accumulated too many (if the server gets bogged down by many rapid-fire requests) or too few (if you want to limit visible downtime to the end-user) requests since the last cron check. You wouldn't have to check the actual /data/ in the weblogs, just find out how many linefeeds there are, since there's one linefeed per server event. Quick, dirty, and you don't need to figure out a new module, at least...?

    andre germain
    "Wherever you go, there you are."

      Two thoughts here - not really related:

      1) Checking the weblogs really doesn't do much to solve his problem, though. His problem is with CPU time being hogged by that one process. I don't think checking web logs is going to do much beyond telling him whether or not his machine has been hit with a high number of HTTP requests recently. Does that necessarily correlate with poor performance? In some cases, it seems so...but I'm not convinced that web hits alone are going to grind his sun box to a halt. It shouldn't - especially since it sounds like most of the pages are static!

      I do like his solution of running top and scraping the output for process info, though.

      2) One of the things I do to monitor one of my websites is run a simple perl script in cron using LWP and HTTP::Request modules. This way, you can make your own request to the site, check a the url for response time, and respond accordingly. Either restart the server automatically through that cron job, or, at the very least, fire off an email to you warning of the potential problems.
        A non-perl ( yes, I know ) possibility is Sun's SymbEL release 3 out on http://www.sunfreeware.com, which our webmaster runs on his Solaris boxen. He did this due to a custom bit of Java that leaks bad. He is quite taken with it, tho I admit to not playing w/ it. YMMV.

        For point 2, that's a beauty idea, even if you only use it to give you a heads up of impending doom!

        I'd also, on the point one above, check the firewall and IDS logs to see if anything less-than-tasty is coming from outside.

        Lastlly, check out Sun's Sun Performance and Tuning Techniques doc ( you may have to register w/ http://sunsolve.sun.com ). The techniques are pretty light ( I run them from time to time on my heavily utilized firewalls - no perl - w/ negligable additional load ).

        As an aside, sysadmins who hang their hopes on patches and new releases exclusively w/o understanding what is *actually* wrong have, IME & IMHO, very short tenures in quality IT staffs. SysAdmins who can diagnose, provide evidence, and occasionally cruft a work-around, thrive - w/ little REM sleep, tho.

        UPDATE: My sysadmin comments were directed toward jlongino's web admin, and not at jlongino. jlongino++ for taking this on.

        HTH
        --
        idnopheq
        Apply yourself to new problems without preparation, develop confidence in your ability to to meet situations as they arrise.

        I agree that top is way useful, but be aware there's been a security problem reported (granted, nearly a year ago) on "systems that have top installed with set user or group permissions".

        I saw it here--unfortunately, there weren't many details or any further references in this report, which I guess is a compliment to the reader's presumed research skills.

        adamsj

        They laughed at Joan of Arc, but she went right ahead and built it. --Gracie Allen

Re: HTTP Daemonology
by Bucket (Beadle) on Aug 09, 2001 at 06:39 UTC
    Here at work I've used Proc::ProcessTable to make a program that is similar to what you want, except ours looks for processes that have been running at high CPU usage for more than an hour with regular priority. It should be able to do what you want without a problem. It's at CPAN of course.
Re: HTTP Daemonology
by kschwab (Vicar) on Aug 09, 2001 at 05:58 UTC
    A few ideas:
    • Scraping top is tough, the solaris supplied /bin/ps might be easier.try something like ps -e -o pid,pcpu,comm | grep ns-httpd
    • If you really want deeper info on cpu utilization, you could look at Solaris::Procfs
    • Have you picked up the last patch for Netscape Enterprise 3.6 ? If you can't upgrade to the newer IPlanet 4.x, you should at least be running 3.6SP3.
      Thanks for the recommendation. I had already implemented the program using top, but then rewrote it using
      ps -efo pid,pcpu,fname as you suggested (and then internally grepped out the ns-httpd). It was a bit easier, but more important, it doesn't require a non-OS application being installed.

      Now that the short-term annoyance is out of the way, I can concentrate some of the other excellent suggestions that require lengthier investigation.

      Thanks everyone for the input.

      Scraping top tough?

      #! /bin/perl -w use strict; my @toplines = `echo q | top`; my $ok = 0; for (@toplines) { next unless /\w/; chomp; if (/PID/) { $ok++; next; } if ($ok) { my($PID,$USERNAME,$THR,$PRI,$NICE, $SIZE,$RES,$STATE,$TIME,$CPU,$COMMAND) = split; print "$PID $CPU $COMMAND\n"; } }


      I know I know your ps solution is great, too ;) I just wanted to show a relatively painless way to scrape top...
      For those interested, here is a snippet of the ps variation I ended up using:
      #!/usr/local/bin/perl use strict; my ($ct, $pcpu, $pname, $i, @outrecs, @matches, $process); my $MaxUtil = 70; my $sttotal = 0; my $ct = 0; foreach $i (1..5) { @outrecs = `ps -efo pid,pcpu,fname`; @matches = grep /ns-httpd/, @outrecs; ## Note: can have more than one ns-httpd running ## but we'll average it in as well foreach (@matches) { $ct++; chomp; ($process, $pcpu, $pname) = split(" "); $sttotal = $sttotal + $pcpu; } sleep(3); } my $pcpuavg = $sttotal / $ct;
      The generated log file provides interesting data but I haven't actually been able to make a correlation between the restart time periods and the access or error logs:
      PID TIME CPU% NAME 5257 Wed Aug 8 23:47:07 2001 2.88 ns-httpd 5257 Wed Aug 8 23:49:09 2001 2.84 ns-httpd 5257 Thu Aug 9 00:00:16 2001 4.10 ns-httpd 5257 Thu Aug 9 00:30:16 2001 1.98 ns-httpd 5257 Thu Aug 9 01:00:15 2001 4.16 ns-httpd 5257 Thu Aug 9 01:30:15 2001 2.46 ns-httpd 5257 Thu Aug 9 02:00:16 2001 1.00 ns-httpd 5257 Thu Aug 9 02:30:16 2001 0.70 ns-httpd 5257 Thu Aug 9 03:00:17 2001 1.28 ns-httpd 5257 Thu Aug 9 03:30:16 2001 0.86 ns-httpd 5257 Thu Aug 9 04:00:17 2001 1.52 ns-httpd 5257 Thu Aug 9 04:30:16 2001 0.90 ns-httpd 5257 Thu Aug 9 05:00:16 2001 87.10 ns-httpd # Avg. Utilization: 87.10% higher than 70%. # Thu Aug 9 05:00:28 2001 www server restarted. PID TIME CPU% NAME 10158 Thu Aug 9 05:30:15 2001 0.34 ns-httpd 10158 Thu Aug 9 06:00:16 2001 2.66 ns-httpd 10158 Thu Aug 9 06:30:15 2001 0.52 ns-httpd 10158 Thu Aug 9 07:00:16 2001 86.20 ns-httpd # Avg. Utilization: 86.20% higher than 70%. # Thu Aug 9 07:00:25 2001 www server restarted. PID TIME CPU% NAME 11405 Thu Aug 9 07:30:15 2001 4.92 ns-httpd 11405 Thu Aug 9 08:00:16 2001 6.60 ns-httpd 16991 Thu Aug 9 08:30:16 2001 14.48 ns-httpd 11405 Thu Aug 9 09:00:19 2001 12.51 ns-httpd 11405 Thu Aug 9 09:30:16 2001 14.30 ns-httpd 11405 Thu Aug 9 10:00:16 2001 17.70 ns-httpd 11405 Thu Aug 9 10:30:16 2001 18.10 ns-httpd 11405 Thu Aug 9 11:00:16 2001 15.18 ns-httpd 11405 Thu Aug 9 11:30:16 2001 22.18 ns-httpd 11658 Thu Aug 9 12:00:17 2001 17.62 ns-httpd 11405 Thu Aug 9 12:30:16 2001 13.72 ns-httpd 11405 Thu Aug 9 13:00:16 2001 18.22 ns-httpd 11405 Thu Aug 9 13:30:16 2001 14.30 ns-httpd 11405 Thu Aug 9 14:00:16 2001 15.08 ns-httpd 11405 Thu Aug 9 14:30:16 2001 14.62 ns-httpd 11405 Thu Aug 9 15:00:16 2001 14.18 ns-httpd 11405 Thu Aug 9 15:30:16 2001 10.04 ns-httpd 11405 Thu Aug 9 16:00:16 2001 13.34 ns-httpd 11405 Thu Aug 9 16:30:16 2001 16.72 ns-httpd 11405 Thu Aug 9 17:00:16 2001 11.86 ns-httpd 11405 Thu Aug 9 17:30:16 2001 8.84 ns-httpd 11405 Thu Aug 9 18:00:16 2001 5.76 ns-httpd 11405 Thu Aug 9 18:30:16 2001 8.26 ns-httpd 11405 Thu Aug 9 19:00:16 2001 7.68 ns-httpd 11405 Thu Aug 9 19:30:16 2001 4.64 ns-httpd 11405 Thu Aug 9 20:00:16 2001 3.44 ns-httpd 11405 Thu Aug 9 20:30:16 2001 10.34 ns-httpd 11405 Thu Aug 9 21:00:16 2001 4.92 ns-httpd 11405 Thu Aug 9 21:30:16 2001 8.28 ns-httpd 11405 Thu Aug 9 22:00:16 2001 4.92 ns-httpd 11405 Thu Aug 9 22:30:16 2001 3.32 ns-httpd 11405 Thu Aug 9 23:00:16 2001 3.06 ns-httpd 11405 Thu Aug 9 23:30:16 2001 3.12 ns-httpd 11405 Fri Aug 10 00:00:16 2001 4.20 ns-httpd 11405 Fri Aug 10 00:30:15 2001 3.20 ns-httpd 11405 Fri Aug 10 01:00:16 2001 2.34 ns-httpd 11405 Fri Aug 10 01:30:16 2001 4.72 ns-httpd 11405 Fri Aug 10 02:00:16 2001 0.60 ns-httpd 11405 Fri Aug 10 02:30:16 2001 1.26 ns-httpd 11405 Fri Aug 10 03:00:17 2001 85.22 ns-httpd # Avg. Utilization: 85.22% higher than 70%. # Fri Aug 10 03:00:29 2001 www server restarted. PID TIME CPU% NAME 4859 Fri Aug 10 03:30:17 2001 1.42 ns-httpd 4859 Fri Aug 10 04:00:17 2001 0.38 ns-httpd 4859 Fri Aug 10 04:30:15 2001 0.16 ns-httpd 4859 Fri Aug 10 05:00:16 2001 0.34 ns-httpd 4859 Fri Aug 10 05:30:16 2001 0.22 ns-httpd 4859 Fri Aug 10 06:00:16 2001 0.44 ns-httpd 4859 Fri Aug 10 06:30:16 2001 0.54 ns-httpd 4859 Fri Aug 10 07:00:16 2001 3.52 ns-httpd 4859 Fri Aug 10 07:30:15 2001 2.14 ns-httpd 4859 Fri Aug 10 08:00:16 2001 4.74 ns-httpd 4859 Fri Aug 10 08:30:16 2001 14.14 ns-httpd 4859 Fri Aug 10 09:00:17 2001 7.65 ns-httpd 4859 Fri Aug 10 09:30:16 2001 14.46 ns-httpd 4859 Fri Aug 10 10:00:16 2001 14.40 ns-httpd 4859 Fri Aug 10 10:30:16 2001 7.67 ns-httpd 4859 Fri Aug 10 11:00:16 2001 11.72 ns-httpd 4859 Fri Aug 10 11:30:15 2001 14.84 ns-httpd 4859 Fri Aug 10 12:00:16 2001 12.70 ns-httpd 4859 Fri Aug 10 12:30:16 2001 10.70 ns-httpd 4859 Fri Aug 10 13:00:17 2001 75.44 ns-httpd # Avg. Utilization: 75.44% higher than 70%. # Fri Aug 10 13:00:28 2001 www server restarted. PID TIME CPU% NAME 10115 Fri Aug 10 13:30:16 2001 13.78 ns-httpd 10115 Fri Aug 10 14:00:16 2001 10.00 ns-httpd 10115 Fri Aug 10 14:30:16 2001 6.76 ns-httpd 10115 Fri Aug 10 15:00:16 2001 9.42 ns-httpd 10115 Fri Aug 10 15:30:16 2001 82.92 ns-httpd # Avg. Utilization: 82.92% higher than 70%. # Fri Aug 10 15:30:27 2001 www server restarted. PID TIME CPU% NAME 27219 Fri Aug 10 16:00:16 2001 14.63 ns-httpd 27219 Fri Aug 10 16:30:15 2001 11.72 ns-httpd 27219 Fri Aug 10 17:00:16 2001 7.20 ns-httpd 27219 Fri Aug 10 17:30:16 2001 5.38 ns-httpd 27219 Fri Aug 10 18:00:16 2001 2.98 ns-httpd 27219 Fri Aug 10 18:30:16 2001 6.62 ns-httpd 27219 Fri Aug 10 19:00:22 2001 74.63 ns-httpd # Avg. Utilization: 74.63% higher than 70%. # Fri Aug 10 19:00:36 2001 www server restarted. PID TIME CPU% NAME 10103 Fri Aug 10 19:30:16 2001 4.70 ns-httpd 10103 Fri Aug 10 20:00:17 2001 3.77 ns-httpd 10103 Fri Aug 10 20:30:16 2001 4.94 ns-httpd 10103 Fri Aug 10 21:00:16 2001 2.78 ns-httpd
      Note: I am still investigating the other suggestions.
Re: HTTP Daemonology
by dga (Hermit) on Aug 09, 2001 at 02:47 UTC

    You may also check the memory utilization change over time as it could be that the server has memory leakage problems much like the browser of a similar name. If thats the case, an upgrade may address it.

    That is of course if this has anything at all to do with the problem in the first place, but a running tally of memory use over time will point this out or clear it from consideration quickly.

    Also if its memory related, once you determine the amount of ram gulped to make it slow you could restart based on that.

Re: HTTP Daemonology
by clemburg (Curate) on Aug 09, 2001 at 17:43 UTC

    Given the response of your Web Admin/Web Services, why not simply restart the daemon/services that get bogged down cyclically (e.g., once a day, once an hour) by some script. As you describe the situation, this should provide a workaround and has the advantage of needing only minimal effort.

    Disclaimer: yes, this is not the professional way to do it. But if they don't want to diagnose ... how can you really fix the problem?

    Christian Lemburg
    Brainbench MVP for Perl
    http://www.brainbench.com

Re: HTTP Daemonology
by elwarren (Priest) on Aug 09, 2001 at 17:37 UTC
    You would be far better off using the sar, iostat, and vmstat commands to monitor your machine. They will give you much better information about what is happening than top will. They do exactly what you want, without the overhead of starting perl from a cronjob every ten minutes (which would probably throw your stats.)

    These tools are not process specific, so I would use these in combination with the output of ps. Then you could monitor the amount of ram per process.

    I'm very interested to see if there are any solutions that are more Perl specific. Some of the *::proc modules look promising, but I've never used them.

    If the server bogs down after a semi-regular interval you could just stop and start the server every 4 hours from a cronjob. Mercy killing it before it has a chance to kill itself.

    HTH
Re: HTTP Daemonology
by scottstef (Curate) on Aug 09, 2001 at 17:43 UTC
    You may want to look at setting up spong It is written in perl, distributed under the Perl Artistic Liscense. Spong is from their faq:

    1. What is Spong?
    This is a simple system monitoring package called spong. It has the following features:
    client based monitoring (CPU, disk, processes, logs, etc...)
    monitoring of network services (smtp, http, ping, pop, dns, etc...)
    grouping of hosts (routers, servers, workstations, PCs)
    rules based messaging when problems occur
    configurable on a host by host basis
    results displayed via text or web based interface
    history of problems
    verbose information to help diagnosis problems

    It may be a little overkill, but it will do what you want it to do and is very easy to set up.

    "The social dynamics of the net are a direct consequence of the fact that nobody has yet developed a Remote Strangulation Protocol." -- Larry Wall

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://103189]
Approved by root
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others exploiting the Monastery: (6)
As of 2024-03-28 22:15 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found