http://qs321.pair.com?node_id=368654

kiat has asked for the wisdom of the Perl Monks concerning the following question:

Hi all,

I would like to hear from anyone who has the experience of managing a fairly busy site with Perl cgi without the use of mod_perl, fastcgi and such.

By 'fairly busy', I'm thinking about something in the region of 30000-50000 hits a day.

In particular, I would like to know about the sort of problems that are encountered during peak hours. Something like 100 users simultaneouly on the site, whether actively doing something (like posting a message) or just surfing, would constitute peak hours to me.

Cheers and as usual, thanks for reading and sharing :)

  • Comment on Perl cgi without mod_perl, your experience

Replies are listed 'Best First'.
Re: Perl cgi without mod_perl, your experience
by tachyon (Chancellor) on Jun 22, 2004 at 10:25 UTC

    mod_perl is typically 20-40x faster than vanilla CGI. That correlates to 20-40x the capacity or headroom in rough terms. So all other things being equal you can run vanilla CGI and buy 20-40 times as much hardware or you can go mod_perl and generate the same throughput. This is a no brainer.

    As you correctly note there is average load and peak. With no change in hardware you have 20x + the headroom with mod_perl to handle those peak loads which is often the issue. A handful of hundreds of request per second is the mod_perl ballpark. A handful or several of requests per second is the vanilla CGI ballpark.

    PHB: I've heard that with mod_perl we can handle 20x as much load with + the same hardware. PHB: Is that true? You: Well yes but our code is badly written and mod_perl is um kinda u +m new and harder.... PHB: So you know that we are in this to make money and hardware is a f +ixed cost? You: Well yes but... PHB: So if you make it work with mod_perl we can save $XXX per month o +r..... You: Well yes but... PHB: What we have here is a *failure* to communicate.....

    30000-50000 hits a day is fairly trivial. That is less than 1 hit per second on average although the peak may be up at 10-20? Parse the logs if you don't know. At these peaks vanilla CGI will aproach its limits. mod_perl will hardly raise a light sweat. Maybe it will never make any differnce. Maybe you will become really popular. Maybe you will crater because you could not handle the load? Dunno. The business case is simple enough. mod_perl = more reqs/sec for the same hardware capital cost.

    cheers

    tachyon

      mod_perl is typically 20-40x faster than vanilla CGI.
      Uhm, that's a bit of useless statement. It may be true in some cases, but it isn't true in other cases. One should consider where the use of mod_perl gets its gains. It gets in gains because it can share some resources, which includes process space and compilation. Another important resource where it can save is sharing of database connections. So, if you have lots of little CGI programs, each doing a quick job, the use of mod_perl can save a lot of resources. If you have CGI programs doing a relatively long jobs (say you have some programs that do custom image manipulation and that process dwarves the time needed for compilation - or you are doing database queries and the time it takes for the queries is far more than starting the Perl program and setting up the database connection), the savings are minimal.

      So, I'd say the answer to the original question would be more like "NOT ENOUGH INFORMATION - DOES NOT COMPUTE". And my answer to the PHB's first question would be "You haven't heard the whole story - in some cases it will save big time. But if you give me a project code, we can run some tests to see how much it matters. BTW, extra hardware also means more reliability."

      Abigail

        While your point it technically valid it is statistically invalid. With the vast majority of interactive websites handled via CGI, mod_perl or something similar is the solution. To be technically correct one would say that you get benfits whenever the startup time (forking an interpreter, connecting to a DB) forms a significant portion of the total runtime. There are relatively few exceptions to this. Downloads and other streams plus long running processing are among those exceptions. It is not a case of *some*, it is a case of *mostly*

        BTW, extra hardware also means more reliability.

        Rubbish. Extra hardware actually increases the chances of a failure. Think about it..... If the mean time to failure is 700 days and you have 700 servers you will on average have one fall over every day. Extra hardware only provides uptime/reliability protection if you use that hardware to create redundant nodes with automatic failover and to be frank I don't think we are talking that level. If you use efficient code (mod_perl) included you may be able to *afford* that kind of infrastructure as boxes that would otherwise be working inefficiently can be made to do more work*, freeing resources for redundancy. But even the simplest high availability system really needs 4 nodes - a pair out front to create your redundant load balancer and a pair behind to do the work/provide failover. Of course there a lots of other ways to skin that cat depending on how much downtime you can tolerate.

        * Of course caning the hell out of your hardware does not help longevity ;-)

        cheers

        tachyon

        If you have CGI programs doing a relatively long jobs (say you have some programs that do custom image manipulation and that process dwarves the time needed for compilation - or you are doing database queries and the time it takes for the queries is far more than starting the Perl program and setting up the database connection), the savings are minimal.

        While not a silver bullet by any means, the Apache Cleanup handler can be very nice. It is essentially the very last phase the the Apache request cycle, and is actually after the last of the headers have been sent to the client (after the request is over form the users perspective).

        $r->register_cleanup(\&my_long_running_sub);
        I (ab)use it to generate very large DB-query intensive PDFs on several sites. The hijacked process itself stores its progress in a database, and marks a flag when the PDF is done. All the while the users page has been auto-refreshing at a reasonable interval (and tying up a second apache child :-P ), and once the PDF is done and the flag has been set, they can download it.

        Sure this can get tricky, since the apache process in a sense becomes "headless" for a while, but with proper exception handling and careful use of alarm you can avoid most of the issues that might come up.

        I would argue too, that this approach is actually more effieient since you will save the cost of module loading and have the benefits of database connection pools and other mod_perl goodies at your disposal.

        -stvn
Re: Perl cgi without mod_perl, your experience
by hardburn (Abbot) on Jun 22, 2004 at 13:06 UTC

    My company's site doesn't quite make your definition of "busy" (we peak around 25k hits per day, and 10-15k is average), but here is my experiance anyway: mod_cgi can keep up. We're running several flat-file databases, some very old code written by people who were inspired by Matt Wright, plus new stuff using HTML::Template, MySQL, PostgreSQL, and CGI::Application. So we've got a little of everything. I haven't noticed the production server hitting significant load averages under mod_cgi.

    What a lot of people miss about mod_perl is that it isn't just about speed. That's an important point over the inefficient mod_cgi model, but there's something much cooler in mod_perl: being able to dig in Apache's internals. You've no idea how much flexibility this can provide until you understand it.

    ----
    send money to your kernel via the boot loader.. This and more wisdom available from Markov Hardburn.

      Nice one!

      (we peak around 25k hits per day, and 10-15k is average),
      The numbers you gave are helpful indicators, thanks!
Re: Perl cgi without mod_perl, your experience
by ajt (Prior) on Jun 22, 2004 at 10:36 UTC

    It's a little elderly, but this article on perl.com may offer some insight: "Building a Large-scale E-commerce Site with Apache and mod_perl".

    CGI can run out of steam pretty quickly if you get all your hits at one go, if you have a low peak load, then it can cope quite easily, plus it's easier to work with and debug. While mod_Perl is different from plain CGI, it does force extra code disciplin, which is useful even if you never migrate from plain CGI to mod_Perl.

    HTH, and good luck.


    --
    ajt
Re: Perl cgi without mod_perl, your experience
by Joost (Canon) on Jun 22, 2004 at 11:59 UTC
    By 'fairly busy', I'm thinking about something in the region 30000-50000 hits a day.
    Depends on what you're doing in your scripts. If you're carful about how much code you load per request and use some clever caching (and don't forget to set caching http headers!), you can probably get that many hits on a CGI. I've built a couple of CGI-based sites that could handle more than that - depending on the hard-ware ofcourse - but they tend to be awfully slow at the peak hours.

    Anyway - and I'm guessing from experience, I don't have any kind of figures to back this up - the biggest problem with perl CGI's under this kind of load is probably not loading the perl interpreter itself (though on heavier loads that will also become a problem), but loading all the code that you have in your scripts. One thing you can do to reduce this, is to require modules just before you actually use them, instead if useing everything at compile time (assuming you have your code in seperate modules that you don't need for every request). Another thing that might slow you down is creating database handles (though MySQL is pretty fast, some other databases make it nearly impossible to run from CGI at all).

    Ofcourse, mod_perl will do away with this problem for you, and then it will be more efficient to use as much as you can in the startup.pl script.

Re: Perl cgi without mod_perl, your experience
by blahblah (Friar) on Jun 22, 2004 at 13:26 UTC
    Also keep in mind that mod_perl isn't the only kid on the block. SpeedyCGI (also known simply as Persistent Perl) is another option.
Re: Perl cgi without mod_perl, your experience
by diotalevi (Canon) on Jun 22, 2004 at 12:01 UTC
    I am interested in hearing about why you thought to pose this question or why you would like to use cgis instead of mod_perl.
      Hi diotalevi,

      First of all, thanks for frontpaging the parent node :)

      I posted it for a couple of (underlying) reasons:

      Firstly I don't know enough about setting up a server and installing mod_perl. Though I've gotten apache, Perl and Mysql to work on my local server, it's on Windows XP. If I ever were to set up my own live server, I would like to do it on Linux or its equivalent. But I only know Linux skin-deep and there're lots of security thingies that I know nuts about.

      Secondly, most shared hosting plans don't include mod_perl in their packages. Dedicated hosting does offer that option with an additional cost, but it's way too expensive.

      Finally, even if I am capable of setting up my own server on a Linux, I would much prefer to leave that responsibility to those who can do it well. Furthermore, you need to get a leased line or something and the costs of keeping the server running might be too high to maintain.

      So I really wanted to find out what mod_cgi's limits are and to hear from those who have used it.

      cheers

        There's an incomplete list here. Also, you can get a virtual server with full root access from many ISPs these days. This usually costs $20 - $40 (US). It's cheaper than dedicated hosting because you share the physical machine, but you have full control over your virtual system.
        I will just expand on perrin's list by noting that pair Networks, the host for perlmonks.org, also does exactly this.
Re: Perl cgi without mod_perl, your experience
by cosimo (Hermit) on Jun 22, 2004 at 17:26 UTC
    Hi Kiat,

    we run many installation of a web-based logistics application where there are lots of quick cgi requests with simple-to-complex database access and an average lifetime of 1 or 2 seconds, with peaks of 6/7 seconds, n. of hits (pages or transactions, not counting images, ...) per day is around 25,000. Statistically, we have lower concurrent requests than you mention (10/15 against 100).

    We are currently fine using straight CGI with PostgreSQL db backend, though at different stages we seriously considered using mod_perl. Many months ago I did some experiments with Apache::Registry and Apache::PerlRun but the "migration" is not painless.

    The most frequent performance bottlenecks in order of importance are:
    1. bad written or sub-optimal database queries
    2. missing or incorrect database indexes
    3. low hard disks throughput and/or hardware misconfigured and/or not very well tuned for performance
    4. serious inefficiencies of perl code in the script

    We also have a low-level framework that is highly optimized and uses a very aggressive scheme of caching because the structure is relatively complex: every database table has a linked perl class that is entirely built at runtime. Also, every field in the table is handled internally by linked field class, so you end up with thousands of objects.

    The whole process of class "building" at runtime was done for every cgi request and then wasted away, so after heavy profiling (with Devel::Dprof), we decided to store the "building result" as special .pm files, that we can stat, create if they don't exist (this happens only the first time there is a database structure change) and simply eval them, saving 95% of the work that is not database related.

    Sorry if I didn't explain very well... The main point here is that if you want to go with cgi, you must plan very aggressive caching.

    Another thing to do try changing httpd server. I found out that simply replacing apache 1.3.x with thttpd, you have a little immediate performance boost on number of static and dynamic requests per second.

    Of course, YMMV.

Re: Perl cgi without mod_perl, your experience
by Seumas (Curate) on Jun 22, 2004 at 14:54 UTC
    Before I describe my situation, let me mention that I am not a web developer or a professional coder and my site is a hobby.

    I run a large auction site. I wrote the entire engine from scratch -- 15,000 lines of PERL with HTML::Template as the presentation layer, PostgreSQL on the backend, served by Apache 1.3.29 on Debian (testing).

    My code is somewhat of a mess because I've been learning as I go over the years and without coding experience prior to perl, grasping OO is a huge leap. I'm rewriting my code so in an OO way bit by bit so that I can eventually use mod_perl.

    I serve approximately 30,000 pages per day (filled with photos and thumbnails for auctions) on a dual 1.5ghz AMD Athlon machine with 2gb RAM and have an average server load of perhaps 25%. During extreme peak times, it might reach 60% but only for a short duration.

    There's really no reason not to use mod_perl unless you're just not sure that your code is up to snuff. Even then, you can run in a modified mode which will still give you some speed benefits to the tune of four or five times.
      Thanks, Seumas!

      I'm rewriting my code so in an OO way bit by bit so that I can eventually use mod_perl.
      From my understanding, plain perl code will run with mod_perl. No? I thought all that mattered was making sure that you don't have globals?
        It doesn't need to be OO, but as you progress you'll find it's much easier to build modules out of your code and then use them in your applications.

        You are correct about globals, programming under mod_perl is what finally got me into the habit of using my.

        Saying what everyone else has said in a different way:
        Since you're running on Windows, remember that you're going to be loading and unloading perl.exe and the perlxx.dll for every request. This can begin to fragment your memory over time. XP reclaims memory much better than NT did, neither as well as a *nix will. Consider a thousand requests coming in, each loading an interpreter and parsing a script, each with a footprint of between 2mb and 10mb each. This can eat up memory very quickly.

        I suggest opening up task manager and adding the mem usage, vm size, and peak mem usage columns to the processes page. Then try to fire up 10 (or as many as you can) connections to your script and watch the processes come in and out. It's a real simple and fast way to see how that's going to scale. I think somebody posted a simple stress test here before, you could easily do it with a quick POE script as well.

        HTH
        Since your code is persistant under mod_perl, you just want to make sure that it's clean code so it isn't necessary to write your stuff in an OO way.

        Really, you want to write clean code - whether or not you're using mod_perl. If it's clean, you'll be able to move to mod_perl later if you want/need/can.

        Unfortunately, in my case, I'm re-writing a system I wrote from scratch when I was first picking up the language so it's kind of messy.