Beefy Boxes and Bandwidth Generously Provided by pair Networks
Come for the quick hacks, stay for the epiphanies.
 
PerlMonks  

Improving performance by generating a static html file?

by BUU (Prior)
on Apr 12, 2002 at 01:56 UTC ( [id://158467]=perlquestion: print w/replies, xml ) Need Help??

BUU has asked for the wisdom of the Perl Monks concerning the following question:

As the title, I wrote a small forum (perl/mysql), and i was thinking about how i could increase the performance. Right now, its all 'dynamic', meaning that when you hit "displaythread.pl" it runs the sql query on the db and then prints back the rows that return (formatting and so forth aside). My idea was that when someone requested a thread, the script would check to see if a html file for that existed, if it did, it would just send back the html file, if it didnt, it would generate the file then send it back. The idea here is to cut down on a cpu power and so forth, as sending back a simple html file has to be less cpu intensive then running a perl script. Any ideas on how/why i could or could not do this? It was suggested i try generating the html file when it was modified?
  • Comment on Improving performance by generating a static html file?

Replies are listed 'Best First'.
Re: Improving performance by generating a static html file?
by Fletch (Bishop) on Apr 12, 2002 at 02:05 UTC

    If you were using HTML::Mason, it comes with a caching mechanism which lets you specify a sub that determines when a pre-existing copy can be used and when it needs to generate a new version.

    Also the mod_perl Developer Cookbook (ISBN 0672322404) has examples of handlers which create cached copies.

Re: Improving performance by generating a static html file?
by joealba (Hermit) on Apr 12, 2002 at 03:27 UTC
    JayBonci hit the nail on the head. I have used this approach a few times and the benefits were substantial. But, that's because I only used this approach in cases where the data was generally quite static.

    You *could* regenerate the html whenever it is modified. If that fits your application, then go for it. But if you're going to update a dozen times every minute, it's probably not going to help.

    Also remember that when you keep everything dynamic, it's that much easier to redesign all your output and make global changes to your data. With your flat file approach, you'd need to change your code or templates and then somehow 'republish' all your static files. That can be a pain.

    If you're looking to speed up your dynamic Perl-driven stuff, take a closer look at mod_perl. Also, tweak your MySQL piece for optimization. Make sure you have appropriate keys in place to make for faster lookups on tables, etc.

    Good luck!
Re: Improving performance by generating a static html file?
by dws (Chancellor) on Apr 12, 2002 at 05:33 UTC
    The idea here is to cut down on a cpu power and so forth, as sending back a simple html file has to be less cpu intensive then running a perl script. Any ideas on how/why i could or could not do this? It was suggested i try generating the html file when it was modified?

    Generating static HTML can be a big win for a high-traffic site that gets many more hits than it does updates.

    For a very elegant example of how to generate a static site from dynamic content, download and examine Moveable Type, which is a blogging package written in Perl (by btrott). It builds a static site on demand whenever dynamic content is published.

Re: Improving performance by generating a static html file?
by Dog and Pony (Priest) on Apr 12, 2002 at 08:14 UTC
    I have built, or participated on the build of at least two sites that use a generated static HTML approach, with good results. None of them in perl, but that should not matter for this discussion. :)

    One of the sites use the approach that it will regenerate the appropriate pages when the administrator updates something - this can involve adding a new news item, or adding a new object for customers to view. This has pretty low frequency, and for all practical purposes, there is only one administrator, so there are no concurrency issues either (although safety has been built in in case they would suprisingly expand a lot or something). All the changes made are saved into a database, and upon pressing "publish", recently changed pages will be regenerated. Since visits are really plentiful when compared to the updates, this is an approach that works really well. Matter of factly, this kind of approach could mean you could have a site on a host with no dynamics at all - you generate the pages from somewhere else and put them there via ftp, scp, rsync or whatever is appropriate. :) In this case though, there are dynamic parts, which includes a browsable web map and customer logins. These parts are purely dynamic, partly for session stuff, and partly because this is still low traffic parts.

    The other approach is an hourly update from the article data base, to the web shop. Not the same company though. When a user buys something, or searches etc, dynamic pages are generated directly from the database, but when they are just surfing around, they see static content. This has some issues when dealing with discrepancies between static and dynamic content upon rare occassions, and we are not satisfied with the solution. However, we picked a bad platform for the job, it couldn't handle the huge load any other way, and we hopefully know better now. Mistakes are good for learning, if not for confidence... *grin*. And no, I am not allowed to fix it. :(

    Anyhow, given the right circumstances, those two approaches really works well when you are worried about heavy load.

    Your third suggestion was to query the DB, or disk, every time someone wanted a page and either return a cached page, or generate and return a new one, right? Well, I still advocate updating on change instead of polling for changes, given that changes are more rare than views. Then the views are without any load at all, while the fewer updates have some load (but not a poll for changes).

    However, depending on the expected load, I am not sure that any static HTML approach is fitting for a forum. If load is high enough, you will have lots of concurrency problems, and you will need to lock the files on disk and other such things - effectively slowing things down a lot again, while they wait for each other. If load is low enough for this to not be a problem, then load is probably not a problem for generating everything on the fly either.

    It would still depend upon the forum, or whatever, but the smaller the difference between updates/reads, the worse is a static HTML approach imo, no matter what the expected load is. And forums are generally something that gets updated pretty much compared to just viewed - unlike a news site or similar. :)

    If you have the time and possibility - try both approaches (static and dynamic). Otherwise, try to think what you can gain/risk/lose on either approach or a healthy combination.

    This probably won't help much. I still hope it will. :)


    You have moved into a dark place.
    It is pitch black. You are likely to be eaten by a grue.
Re: Improving performance by generating a static html file?
by Tardis (Pilgrim) on Apr 12, 2002 at 06:01 UTC
    Update: OK now it's been mentioned between the time I started typing and the time I pressed submit. Damn you speedy fellow monks, damn you :-)


    One approach I haven't seen mentioned, is that you could get 'smarter' about your dynamic-ness.

    What I mean by that is design your system so that you can tell when a change has been made, so you always know when you need to regenerate your HTML.

    So to translate this to your system, when a thread has an addition, generate the HTML code, store it, and mark the thread as clean.

    Next time that thread is viewed, you only need to make a single database lookup to decide if you can re-use the HTML or not. If it's still clean, go for the cached version.

    When additions are made to the thread, simply mark the thread as 'dirty', and the next time it's viewed, the HTML will be generated again.

    Issues to think about here are concurrency, and also making sure everything that can affect the thread updates the dirty flag.

Re: Improving performance by generating a static html file?
by tmiklas (Hermit) on Apr 12, 2002 at 13:35 UTC
    IMHO it's a very good idea, unless you try mod_perl. Then you will probably say 'Ok - it runs very fast, but occupies some extra memory'. What then?
    mod_perl is really great, and in some cases IT IS the best solution. If your forum is small (as you wrote), then you should consider generating static HTML files whenever something new is added to the forum...
    If you run Linux you can try to accelerate your web with httpd kernel support (yes - it works only with static files, dynamic requests are still passed to apache or whatever you use)...
    Of course search engines, etc. must remain dynamic, but that's another storry.

    Conclusion:
    You have to measure yourself which solution would be better for you. I use both of them. When there is anything that has to be generated dynamically (and i can't avoid it), i use mod_perl (mostly), but when i have to modify content once/few times a day i use generated html files ;-) That's all!

    Greetz, Tom.
Re: Improving performance by generating a static html file?
by perrin (Chancellor) on Apr 12, 2002 at 14:26 UTC
    I described the caching system we used at eToys.com here.
Re: Improving performance by generating a static html file?
by mamboking (Monk) on Apr 12, 2002 at 14:54 UTC
    Before I would look into caching the html, I would look at trying to improve my DB performance. Are you caching connections to your database? If not, then you probably want to have a pool of open db connections available to your program. Connecting to the database is usually very costly. Also, as was already mentioned, make sure that your tables are properly indexed. I'm not sure if your database has an "explain plan" to show how your query will be executed, but if it does try to minimize the number of table/full scans that are performed.
Re: Improving performance by generating a static html file?
by tmiklas (Hermit) on Apr 12, 2002 at 13:54 UTC
    Update: I forgot about one thing - when necesary, try not to regenerate all the files ;-) but only those that really has to be changed. If you have a logical tree-like site structure with plain (and logical) connections between the content (for example menus or leads for most recent topics - placed on other pages somewhere in your site), then it's going to be really simple task ;-) Good luck!

    Greetz, Tom.
Re: Improving performance by generating a static html file?
by BUU (Prior) on Apr 12, 2002 at 05:16 UTC
    Much thanks for the replies. As for static vs dynamic, what i have would be a list of replies to a topic, so the content of that file would be the same untill someone replies/edits/what not. I dont think there would be that much overhead, just a bunch of print FILE;'s

    I looked into the cache::cache module, and im a little unsure of exactly how i would use it to 'cache' a html page?

    And as for optimizing my mysql queries, can you point me to a good reference or give me some tips?
      And as for optimizing my mysql queries, can you point me to a good reference or give me some tips?

      First, joealba had mentioned to make sure that your database tables have good indexes and keys - and he is right.

      From experience (of not doing this the first time around) I would add to make certain that your database is properly normalized BEFORE it gets large and painfully slow to query.

      There are plenty of good places to look for info on this. Go to Google and search for 'MySQL database normalization' -- for that matter, look at mysql.org for hints.

      You know that you are in good shape if you can get whatever you are looking for with an absolute minimum number of query statements using table joins. One query is obviously ideal. If the query is slow, you may want to rethink your keys/indexes.

      The performance benefits of a persistent database connection has already been addressed. If you can, I would suggest using it.

      my $.02

Re: Improving performance by generating a static html file?
by BUU (Prior) on Apr 12, 2002 at 02:17 UTC
    But is this a "Good Thing(tm)"? Maybe i should clarify. My real question is now 'how' can i do this (i already have some ideas..) but if i should.
      This is where you look at a performance analysis of the item. It depends on your your caching scheme and the nature of your data. There are a few basic principles behind a cache mechanism that you need to take note of (this is midly off topic). Keep in mind always that a cache is the epitome of the classical space/time tradeoff. With that in mind:
      • Is the data that you are generating dynamically mostly static or mostly dynamic? This is really important to consider. If you are generating basically the same thing over and over again, then it's fairly cache-safe. If it changes a lot (has the times the poster last logged in, how many posts they've made, or replies come in often), then your stored information has the probability of being "dirty", and thus useless to store somewhere.
      • Is the overhead of the caching worthwhile? Are you typically going to do a lot of mucking with the data to store it? Are you putting the cache on disk or in a database? These are questions you will have to answer for yourself to come up with the right answer, but performance tuning will be involved. Start the caching, and see what hits the cache and what misses it.
      • Can you afford the space/time tradeoff? Do you have the space available to store the cache? Do you have a mechanism in place to clean the cache once it becomes to old (to keep it efficient)?


      Those reflections should be able to answer your questions for yourself. If you're looking into perl modules: Or you can roll your own solution taylored to meet your needs. Have fun with the project

          --jb

        The Cache::FileCache (part of the Cache::Cache family) module is worth looking into as well. Most modern filesystems have very efficient algorithms for keeping MRU data in memory, making it very speedy and simple to boot. (IME shared memory isn't worth the effort.)

        Chris
        M-x auto-bs-mode

Re: Improving performance by generating a static html file?
by BUU (Prior) on Apr 12, 2002 at 05:55 UTC
    Hmm, yet another thought on the issue. Didnt ikonboard used to do something similar to this, and isnt that why it got banned from a lot of hosts?
A reply falls below the community's threshold of quality. You may see it by logging in.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://158467]
Approved by talexb
Front-paged by japh
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others lurking in the Monastery: (2)
As of 2024-04-19 18:44 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found