http://qs321.pair.com?node_id=319829

waxmop has asked for the wisdom of the Perl Monks concerning the following question:

I'm writing a database-backend web application for my office to allow users to list, view details, and edit a bunch clients data.

Normally, I would write some CGI / perl-mason scripts like this:

view_contact.cgi?id=34&showdata=all

or:

list_matching_contacts.cgi?matchfield=state&matchvalue=OH

Those programs would open the database, query, and the return the results formatted as html. Simple stuff.

Since I expect that users will be viewing data (read-only) much more often than editing data, I am considering instead writing a build_html() script that will go through the list of every contact in the database and write a few different static .html files for that contact. I will call this script after anyone uses any of the edit.cgi scripts.

The advantage is that I will avoid lots and lots of redundant database calls. The disadvantage is that I will have a directory with about 5 - 10 thousand html files. According to df -i, the partition I'll be working on has about 3 million inodes available, so I don't think I'll be doing much damage there, but I wanted to get the wisdom of the monks before I begin.

Is there a third way? All comments are welcomed.

Replies are listed 'Best First'.
•Re: web-app design question
by merlyn (Sage) on Jan 08, 2004 at 17:05 UTC
      Thanks. I'm designing my build_html script to be a bunch of functions, so I can choose to update just small subsets of the static html files, rather than all or nothing. Adding in a 404 handler seems like a fun project (and a way to learn handlers), so maybe I'll put that into the version 2.0.
Re: web-app design question
by duff (Parson) on Jan 08, 2004 at 16:11 UTC

    Or, rather than building all of the static pages at once, you could just build the static page on first use and keep a cache of these pages (of course, invalidating cache entries as the data changes). One advantage is that you might not need to generate all 10000 pages, but just the ones that are used the most. I don't know if that's a win in your situation or not though.

    Oh, as always, check CPAN. I believe there's a module that handles caching for you.

Re: web-app design question
by waswas-fng (Curate) on Jan 08, 2004 at 15:57 UTC
    One other thing you can do is look at your weblog stats and see what pages get it the most, you usually can get away just building a few static pages (initial site drop in and main sections) to achieve a substantial performance increase. Building all of the pages as static on a schedule can actually decrease performance if you have a lot of dynamic pages that do not get used frequently.

    If you must build the entire site statically and there will be 15k html pages, it may make sense to make a logical tree so stats on the files do not take as long. if you split those files up into 100 -> 1000 file/directories you should see a nice performance boost on stat (on most filesystems).

    After all is said and done, think carefully about doing this, by statically building pages you lose the ability to customize based on login/region/etc -- static is by definition not dynamic. =)

    You may also want to take a look at mod_perl and some of the DB caching modules out there -- you may be able to squeeze more performance out of your box that way as well.


    -Waswas
Re: web-app design question
by arden (Curate) on Jan 08, 2004 at 16:14 UTC
    The key here is going to be the ratio between the number of times users request data in a read-only fashion versus updating the data. If the system has to re-write 5-10 thousand html files when somebody updates a record in the database, static pages would only be helpful if around half a million or more requests for the static data occur before another update.

    Also, how long do you think it would take for the build_html() script to complete after being called? You need to realize that most users are going to try to verify that they're update "took" in the system by immediately checking for it. If the new static page hasn't been re-built, they're going to think it didn't update and either re-submit or call tech-support (you) to complain.

    Ultimately, you need to get some numbers on how many files would be updated, how often updates are performed, how many times the static pages would be requested between updates, how much CPU time does the current fetch take, how much CPU time would the build_html() script take, etc... and decide if it would benefit your situation. I'd wager it might be more hassle than it's worth and it might even cause some breaks in the business rules...

Re: web-app design question
by borisz (Canon) on Jan 08, 2004 at 15:42 UTC
    Your system of writing static pages is fine for static data, but you have a database and if the data change there you must rebuild all your static files.