Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

Re: Perl cgi without mod_perl, your experience

by tachyon (Chancellor)
on Jun 22, 2004 at 10:25 UTC ( #368658=note: print w/replies, xml ) Need Help??


in reply to Perl cgi without mod_perl, your experience

mod_perl is typically 20-40x faster than vanilla CGI. That correlates to 20-40x the capacity or headroom in rough terms. So all other things being equal you can run vanilla CGI and buy 20-40 times as much hardware or you can go mod_perl and generate the same throughput. This is a no brainer.

As you correctly note there is average load and peak. With no change in hardware you have 20x + the headroom with mod_perl to handle those peak loads which is often the issue. A handful of hundreds of request per second is the mod_perl ballpark. A handful or several of requests per second is the vanilla CGI ballpark.

PHB: I've heard that with mod_perl we can handle 20x as much load with + the same hardware. PHB: Is that true? You: Well yes but our code is badly written and mod_perl is um kinda u +m new and harder.... PHB: So you know that we are in this to make money and hardware is a f +ixed cost? You: Well yes but... PHB: So if you make it work with mod_perl we can save $XXX per month o +r..... You: Well yes but... PHB: What we have here is a *failure* to communicate.....

30000-50000 hits a day is fairly trivial. That is less than 1 hit per second on average although the peak may be up at 10-20? Parse the logs if you don't know. At these peaks vanilla CGI will aproach its limits. mod_perl will hardly raise a light sweat. Maybe it will never make any differnce. Maybe you will become really popular. Maybe you will crater because you could not handle the load? Dunno. The business case is simple enough. mod_perl = more reqs/sec for the same hardware capital cost.

cheers

tachyon

Replies are listed 'Best First'.
Re: Perl cgi without mod_perl, your experience
by Abigail-II (Bishop) on Jun 22, 2004 at 13:33 UTC
    mod_perl is typically 20-40x faster than vanilla CGI.
    Uhm, that's a bit of useless statement. It may be true in some cases, but it isn't true in other cases. One should consider where the use of mod_perl gets its gains. It gets in gains because it can share some resources, which includes process space and compilation. Another important resource where it can save is sharing of database connections. So, if you have lots of little CGI programs, each doing a quick job, the use of mod_perl can save a lot of resources. If you have CGI programs doing a relatively long jobs (say you have some programs that do custom image manipulation and that process dwarves the time needed for compilation - or you are doing database queries and the time it takes for the queries is far more than starting the Perl program and setting up the database connection), the savings are minimal.

    So, I'd say the answer to the original question would be more like "NOT ENOUGH INFORMATION - DOES NOT COMPUTE". And my answer to the PHB's first question would be "You haven't heard the whole story - in some cases it will save big time. But if you give me a project code, we can run some tests to see how much it matters. BTW, extra hardware also means more reliability."

    Abigail

      While your point it technically valid it is statistically invalid. With the vast majority of interactive websites handled via CGI, mod_perl or something similar is the solution. To be technically correct one would say that you get benfits whenever the startup time (forking an interpreter, connecting to a DB) forms a significant portion of the total runtime. There are relatively few exceptions to this. Downloads and other streams plus long running processing are among those exceptions. It is not a case of *some*, it is a case of *mostly*

      BTW, extra hardware also means more reliability.

      Rubbish. Extra hardware actually increases the chances of a failure. Think about it..... If the mean time to failure is 700 days and you have 700 servers you will on average have one fall over every day. Extra hardware only provides uptime/reliability protection if you use that hardware to create redundant nodes with automatic failover and to be frank I don't think we are talking that level. If you use efficient code (mod_perl) included you may be able to *afford* that kind of infrastructure as boxes that would otherwise be working inefficiently can be made to do more work*, freeing resources for redundancy. But even the simplest high availability system really needs 4 nodes - a pair out front to create your redundant load balancer and a pair behind to do the work/provide failover. Of course there a lots of other ways to skin that cat depending on how much downtime you can tolerate.

      * Of course caning the hell out of your hardware does not help longevity ;-)

      cheers

      tachyon

        Extra hardware actually increases the chances of a failure.
        Yes, but that's not of mosts peoples interest. It's like saying "I don't do backups, because that could mean that either my hard disk or my tape contains bad spots". While it may increase a failure, it reduces the chance of a critical failure, where a criticial failure means the service you are providing is no longer available (or only available at unacceptable performances).
        If the mean time to failure is 700 days and you have 700 servers you will on average have one fall over every day.
        If the mean time between failure is 700 days, and you have one server, you will be down once every 700 days. If you have 700 servers, you will be down every
        37036335534589881919519745177905091061529367089546822435775456657617\ 43636878121352291779253462053983059009668861547217195682739117850118\ 35008240379192887792604500837043507056449661590126378834827343300415\ 51155924340365412561936621885141113576008432906355745321587893612547\ 92657179813327520180208828937231810950060232310658708592626955683634\ 89377559706408723518059008437790717245520601634447063767955926579796\ 52663793731051027728096621773894169469654930678654263045798895238772\ 34666615299867665848656245124536507750920588975484100300349256862746\ 40081407312113263209011491753853770009409642000100000000000000000000\ 00000000000000000000000000000000000000000000000000000000000000000000\ 00000000000000000000000000000000000000000000000000000000000000000000\ 00000000000000000000000000000000000000000000000000000000000000000000\ 00000000000000000000000000000000000000000000000000000000000000000000\ 00000000000000000000000000000000000000000000000000000000000000000000\ 00000000000000000000000000000000000000000000000000000000000000000000\ 00000000000000000000000000000000000000000000000000000000000000000000\ 00000000000000000000000000000000000000000000000000000000000000000000\ 00000000000000000000000000000000000000000000000000000000000000000000\ 00000000000000000000000000000000000000000000000000000000000000000000\ 00000000000000000000000000000000000000000000000000000000000000000000\ 00000000000000000000000000000000000000000000000000000000000000000000\ 00000000000000000000000000000000000000000000000000000000000000000000\ 00000000000000000000000000000000000000000000000000000000000000000000\ 00000000000000000000000000000000000000000000000000000000000000000000\ 00000000000000000000000000000000000000000000000000000000000000000000\ 00000000000000000000000000000000000000000000000000000000000000000000\ 00000000000000000000000000000000000000000000000000000000000000000000\ 00000000000000000000000000000000000000000000000000000000000000000000\ 00000000000000000000000000000000000000000000000000000000000000000000\ 00000000000000000000
        days. Or if your mean time between failure is 700 days, and it takes a day to recover from a failure, with only two servers you will be down once every 1342 years. Redundant servers work in parallel, and not in a serial configuration.

        But even the simplest high availability system really needs 4 nodes - a pair out front to create your redundant load balancer and a pair behind to do the work/provide failover.
        High availability systems don't need load balancers. It's a high availability system - not a load balancing system. All the high availability systems I've worked with, HP's ServiceGuard, Veritas Cluster, SUN Cluster work fine with 2 nodes.
        Of course there a lots of other ways to skin that cat depending on how much downtime you can tolerate.
        Oh, yeah, but if you can tolerate downtime, you may be able to tolerate slower service. ;-)
      If you have CGI programs doing a relatively long jobs (say you have some programs that do custom image manipulation and that process dwarves the time needed for compilation - or you are doing database queries and the time it takes for the queries is far more than starting the Perl program and setting up the database connection), the savings are minimal.

      While not a silver bullet by any means, the Apache Cleanup handler can be very nice. It is essentially the very last phase the the Apache request cycle, and is actually after the last of the headers have been sent to the client (after the request is over form the users perspective).

      $r->register_cleanup(\&my_long_running_sub);
      I (ab)use it to generate very large DB-query intensive PDFs on several sites. The hijacked process itself stores its progress in a database, and marks a flag when the PDF is done. All the while the users page has been auto-refreshing at a reasonable interval (and tying up a second apache child :-P ), and once the PDF is done and the flag has been set, they can download it.

      Sure this can get tricky, since the apache process in a sense becomes "headless" for a while, but with proper exception handling and careful use of alarm you can avoid most of the issues that might come up.

      I would argue too, that this approach is actually more effieient since you will save the cost of module loading and have the benefits of database connection pools and other mod_perl goodies at your disposal.

      -stvn
        And don't forget to mention that using a reverse proxy will keep your server with a minimal number of mod_perl processes running, i.e., setup an Apache on port 80 (with mod_proxy activated) and redirect every *.pl requests (or all /cgi-bin request, for example) to your mod_perl enabled Apache on another port (81, for instance).

        You'll be able to handle even more scripts per second because you will keep the mod_perl server busy only to process the request, and the transmission of the output will be a task for your light Apache on port 80, freeing it to handle another script request.
        HTH.

        That's actually not the best way to do it. Ideally, you would fork so that your processing does not tie up an apache child process at all. This is how we recommend handling long running jobs on the mod_perl list.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://368658]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others cooling their heels in the Monastery: (3)
As of 2021-11-27 03:37 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?