http://qs321.pair.com?node_id=368711


in reply to Re: Perl cgi without mod_perl, your experience
in thread Perl cgi without mod_perl, your experience

mod_perl is typically 20-40x faster than vanilla CGI.
Uhm, that's a bit of useless statement. It may be true in some cases, but it isn't true in other cases. One should consider where the use of mod_perl gets its gains. It gets in gains because it can share some resources, which includes process space and compilation. Another important resource where it can save is sharing of database connections. So, if you have lots of little CGI programs, each doing a quick job, the use of mod_perl can save a lot of resources. If you have CGI programs doing a relatively long jobs (say you have some programs that do custom image manipulation and that process dwarves the time needed for compilation - or you are doing database queries and the time it takes for the queries is far more than starting the Perl program and setting up the database connection), the savings are minimal.

So, I'd say the answer to the original question would be more like "NOT ENOUGH INFORMATION - DOES NOT COMPUTE". And my answer to the PHB's first question would be "You haven't heard the whole story - in some cases it will save big time. But if you give me a project code, we can run some tests to see how much it matters. BTW, extra hardware also means more reliability."

Abigail

Replies are listed 'Best First'.
Re^2: Perl cgi without mod_perl, your experience
by tachyon (Chancellor) on Jun 22, 2004 at 14:18 UTC

    While your point it technically valid it is statistically invalid. With the vast majority of interactive websites handled via CGI, mod_perl or something similar is the solution. To be technically correct one would say that you get benfits whenever the startup time (forking an interpreter, connecting to a DB) forms a significant portion of the total runtime. There are relatively few exceptions to this. Downloads and other streams plus long running processing are among those exceptions. It is not a case of *some*, it is a case of *mostly*

    BTW, extra hardware also means more reliability.

    Rubbish. Extra hardware actually increases the chances of a failure. Think about it..... If the mean time to failure is 700 days and you have 700 servers you will on average have one fall over every day. Extra hardware only provides uptime/reliability protection if you use that hardware to create redundant nodes with automatic failover and to be frank I don't think we are talking that level. If you use efficient code (mod_perl) included you may be able to *afford* that kind of infrastructure as boxes that would otherwise be working inefficiently can be made to do more work*, freeing resources for redundancy. But even the simplest high availability system really needs 4 nodes - a pair out front to create your redundant load balancer and a pair behind to do the work/provide failover. Of course there a lots of other ways to skin that cat depending on how much downtime you can tolerate.

    * Of course caning the hell out of your hardware does not help longevity ;-)

    cheers

    tachyon

      Extra hardware actually increases the chances of a failure.
      Yes, but that's not of mosts peoples interest. It's like saying "I don't do backups, because that could mean that either my hard disk or my tape contains bad spots". While it may increase a failure, it reduces the chance of a critical failure, where a criticial failure means the service you are providing is no longer available (or only available at unacceptable performances).
      If the mean time to failure is 700 days and you have 700 servers you will on average have one fall over every day.
      If the mean time between failure is 700 days, and you have one server, you will be down once every 700 days. If you have 700 servers, you will be down every
      37036335534589881919519745177905091061529367089546822435775456657617\ 43636878121352291779253462053983059009668861547217195682739117850118\ 35008240379192887792604500837043507056449661590126378834827343300415\ 51155924340365412561936621885141113576008432906355745321587893612547\ 92657179813327520180208828937231810950060232310658708592626955683634\ 89377559706408723518059008437790717245520601634447063767955926579796\ 52663793731051027728096621773894169469654930678654263045798895238772\ 34666615299867665848656245124536507750920588975484100300349256862746\ 40081407312113263209011491753853770009409642000100000000000000000000\ 00000000000000000000000000000000000000000000000000000000000000000000\ 00000000000000000000000000000000000000000000000000000000000000000000\ 00000000000000000000000000000000000000000000000000000000000000000000\ 00000000000000000000000000000000000000000000000000000000000000000000\ 00000000000000000000000000000000000000000000000000000000000000000000\ 00000000000000000000000000000000000000000000000000000000000000000000\ 00000000000000000000000000000000000000000000000000000000000000000000\ 00000000000000000000000000000000000000000000000000000000000000000000\ 00000000000000000000000000000000000000000000000000000000000000000000\ 00000000000000000000000000000000000000000000000000000000000000000000\ 00000000000000000000000000000000000000000000000000000000000000000000\ 00000000000000000000000000000000000000000000000000000000000000000000\ 00000000000000000000000000000000000000000000000000000000000000000000\ 00000000000000000000000000000000000000000000000000000000000000000000\ 00000000000000000000000000000000000000000000000000000000000000000000\ 00000000000000000000000000000000000000000000000000000000000000000000\ 00000000000000000000000000000000000000000000000000000000000000000000\ 00000000000000000000000000000000000000000000000000000000000000000000\ 00000000000000000000000000000000000000000000000000000000000000000000\ 00000000000000000000000000000000000000000000000000000000000000000000\ 00000000000000000000
      days. Or if your mean time between failure is 700 days, and it takes a day to recover from a failure, with only two servers you will be down once every 1342 years. Redundant servers work in parallel, and not in a serial configuration.

      But even the simplest high availability system really needs 4 nodes - a pair out front to create your redundant load balancer and a pair behind to do the work/provide failover.
      High availability systems don't need load balancers. It's a high availability system - not a load balancing system. All the high availability systems I've worked with, HP's ServiceGuard, Veritas Cluster, SUN Cluster work fine with 2 nodes.
      Of course there a lots of other ways to skin that cat depending on how much downtime you can tolerate.
      Oh, yeah, but if you can tolerate downtime, you may be able to tolerate slower service. ;-)
        This is only true if you assume that your machine magically repair themselves after they fail. In reality, it might take a day or a week or a month to repair depending on the type of failure. I've been in the position of having noticable degradation of service because our CPUs were failing faster than Sun could provide replacements, so dead machines were piling up.
Re^2: Perl cgi without mod_perl, your experience
by stvn (Monsignor) on Jun 22, 2004 at 14:18 UTC
    If you have CGI programs doing a relatively long jobs (say you have some programs that do custom image manipulation and that process dwarves the time needed for compilation - or you are doing database queries and the time it takes for the queries is far more than starting the Perl program and setting up the database connection), the savings are minimal.

    While not a silver bullet by any means, the Apache Cleanup handler can be very nice. It is essentially the very last phase the the Apache request cycle, and is actually after the last of the headers have been sent to the client (after the request is over form the users perspective).

    $r->register_cleanup(\&my_long_running_sub);
    I (ab)use it to generate very large DB-query intensive PDFs on several sites. The hijacked process itself stores its progress in a database, and marks a flag when the PDF is done. All the while the users page has been auto-refreshing at a reasonable interval (and tying up a second apache child :-P ), and once the PDF is done and the flag has been set, they can download it.

    Sure this can get tricky, since the apache process in a sense becomes "headless" for a while, but with proper exception handling and careful use of alarm you can avoid most of the issues that might come up.

    I would argue too, that this approach is actually more effieient since you will save the cost of module loading and have the benefits of database connection pools and other mod_perl goodies at your disposal.

    -stvn
      And don't forget to mention that using a reverse proxy will keep your server with a minimal number of mod_perl processes running, i.e., setup an Apache on port 80 (with mod_proxy activated) and redirect every *.pl requests (or all /cgi-bin request, for example) to your mod_perl enabled Apache on another port (81, for instance).

      You'll be able to handle even more scripts per second because you will keep the mod_perl server busy only to process the request, and the transmission of the output will be a task for your light Apache on port 80, freeing it to handle another script request.
      HTH.

      That's actually not the best way to do it. Ideally, you would fork so that your processing does not tie up an apache child process at all. This is how we recommend handling long running jobs on the mod_perl list.

        From what I know about fork, and mod_perl, I can't understand why that would be the recommened way to handle this (and I am in no way claiming to know everything about fork/mod_perl, so I may be way off here). My understanding is that forking under mod_perl would result in a duplication of the apache child process, complete with mod_perl and all its goodies. Sure that could then be chopped off of the apache parent and set to live on its own by closing file-handles and such, but then I have a big-fat-apache zombie process which eventually will just need to get reaped by the OS's init process.

        How can that be better than hijacking an Apache process with the Cleanup phase for a little while? Am I grossly misunderstanding something here?

        -stvn