http://qs321.pair.com?node_id=57340

Masem has asked for the wisdom of the Perl Monks concerning the following question:

I'm in the process of redesigning/programming a web site of mine in order to take advantage of mod_perl and basically to clean up 3yr+ old kludges now that I know what I'm doing. I've got standard DBI through Mysql, and much of the functionality of the site is searching, producing the output listing, and go from there. Certainly close to a no-brainer.

Now, reading through a recent entry at perl.com on the new Template module, I got to thinking how I may be able to take advantage of this: I certainly could use the ability to use templates to produce various results of the DBI output that I'm getting. However, I'm concerned about speed -- with preliminary scripts running under a mod_perl-enabled virtual host of apache, I seem to be getting no speed benefit, yet. Mind you, two possible slow effects are occuring: I'm using fetchrow_hashref to get the values out from the DBI call, which might have some overhead, and after getting the row data, I do additional processing (more DBI or file reading) for supplimentary stuff. And furthermore, some of the formats for the Templates that I have planned will not use all this data. The data is simple rows, no complex data structures outside a few join'ed arrays.

In other words, I have a tradeoff between a concise and modular function that does the results display but runs slow, and to use several different functions for each list type but most likely a benefit in speed. I'd rather avoid the latter case, but I'm finding right now that even with queries of 5 results, the results take a long time to process. Is Template just not the right tool here, and should I fall back on just using something similar to here documents for each row?

Replies are listed 'Best First'.
Re: Speed of Template
by chromatic (Archbishop) on Feb 09, 2001 at 10:18 UTC
    According to the DBI book, fetchrow_hashref() is notably slower than the corresponding array or array_ref functions. Speeding up database access is a good start.

    Besides that, though I have seen Template Toolkit slow things down, the TT2 does a compilation/caching stage. In a persistent environment, you'll only pay the penalty of caching templates once, then it's amortized over the rest of the calls to that template. It may not run quite as fast as a heredoc, but it has the potential to do many more complicated things.

    If that weren't enough, it's much easier to change a template than it is to change a heredoc. An hour's worth of my time would buy extra memory. A new hard drive if I have to wear a tie.

    I would recommend that you use Template for its many benefits, only then worrying about speed. It's not *that* slow, if you do it right. The difference between 5 hits a second and 10 hits a second may not be important when you realize how many hundreds of thousands of hits per day you'll get by that point.

Increasing Performance of mod_perl/CGI/MySQL programs
by dkubb (Deacon) on Feb 12, 2001 at 02:26 UTC

    mod_perl/CGI/MySQL programs are interesting applications to attempt increasing performance of, there are so many moving parts, you need to take a more holistic approach. There's generally a combination of several factors affecting performance. Warning: This will be a long post (and possibly slightly OT), because I am fairly interested in this subject =)

    Below is an outline of the areas I believe you need to consider when attemping to increase the performance of your perl database interface. Some many not be possible depending on the resources/time you have available, but I wanted to outline everything I could think of, and allow you to choose what is possible:

    • Minimize Disk IO. Look at the physical hardware you are running the database and/or webserver on. Generally, IO problems contribute the greatest to system slowdowns. Look at purchasing a faster hard drive, more ram if your machine is swapping out to the hard drive, or - in the extreme - dedicating a server to house the database.

    • Look at the setup of your MySQL database . There is a great FAQ about getting maximum performance out of MySQL, here.

      • Analyze your queries, and figure out which take the most amount of time, then learn how to make indexes. Remember, don't just set these and think you're done. I've been using MySQL full-time for just over 2 years now, and I routinely make small tweaks with indexes I thought were perfect a month earlier. Set a monthly or bi-monthly schedule to check your indexes, it is worth it. This is the most important of all the tips I outline in this post. It's not uncommon for indexing to get you a 100% speed increase from a few minutes of tweaking.

      • Here's a neat one, I've read about but not used (yet), Heap Tables. A heap table is an in-memory table that validation-type data should be stored in. You put any information in here that is somewhate small, and constant such as a list of valid Countries.

        When mysql starts it does so with the help of a start-up script. You can change the start-up script so it to performs a series of SQL commands after beginning the server. This can be used to automatically create a HEAP table, and fill it with data from other tables/outside sources. Queries that use the heap table could be greatly sped up. Anytime where you need alot of speed from your "validation" tables, give this a shot.

    • Anaylze all the SQL queries not only with EXPLAIN, but using DBIx::Profile in your perl programs. It will give you a nice breakdown of each SQL query that was run, and tell you which SQL queries are taking the largest amount of time.

    • Never use SELECT * inside an SQL query. Only fetch the columns & rows you need, nothing more. It's a huge waste of resources to ask for information you throw out and never use.

    • Web Server Setup. Look at Apache, and if there is anything you can do to speed it up. Try using Apache::DBI, which caches the database handle. This means that when your script runs it won't need to connect to the database, because the connection is kept open for you. Also see if there are any modules compiled into Apache that are not necessary to the functioning of your website. Consider recompiling with just what you need - go lean.

      Here are two FAQ's on Apache and mod_perl performance you should look at:

    • Write the shortest, cleanest perl code you can to get the job done. The shorter the code, the easier it is to optimize and bug test, since you have less to keep in your head all at once.

    • Use Devel::DProf and Apache::DProf to profile the actual perl code to see where your program is spending the most amount of time. It gives you a nice breakdown of each function in your program and tell you how long each one took. Without doing this, you'll just be just guessing. I once read somewhere that programmers spend most of their time optimizing the wrong section of code. Don't fall into this trap, learn to profile.

    • Now, with having said all that, do not be afraid to use a templating system. Yes, you will get a slight slowdown when compared to embedding the HTML right in the perl code, but the benefits are numerous. You get cleaner code, a seperation of presentation from logic, and more avenues for extra optimizations (which I'll get into next).

      One of my personal favorites is HTML::Template, which allows a complete seperation of logic and design. You can't embed perl code inside it, instead there is a mini templating language you use. There's something about mixing two different syntaxes together (SQL and perl, HTML and perl, etc) that confuses me, which is why I like HTML::Template. Combine this with HTML::Pager for easy paging of database results, and it makes hard things simple.

    So, you're database and web server are running at peak performance. Your perl code is optimized and profiled. Want more speed? Now you need to start looking at what happens after the information leaves your server.

    • Do some remote load testing. There are many great services that can do this for you, some even free or offering free trials. A quick search on google turns up hundreds of related sites, two key players in this area are Keynote and Service Metrics. I've used both of these, with good results.

      With this information you can pin-point where a/the speed problem lies. They can tell you if it's your server, or hosting company. You can use this as ammo when negotiating for faster/better service. Also, consider getting an SLA (Service Level Agreement) from your hosting company to gaurantee the speed of your pipe.

    • Don't forget the browser! The browser is an often forgotten piece of the puzzle. You want to make sure that the browser can download and render the HTML in the fastest possible time. How do you do this? A simple answer is to make sure your HTML is XHTML 1.0 compliant, a super clean version of HTML. Now that your HTML is completely inside HTML template files, it will be relatively painless to process them through HTML-Tidy.

      HTML-Tidy is a utility that takes any sort of HTML and outputs cleaned up XHTML compliant code. The theory is that if the browser doesn't have to "guess" at the sizes of images or close "p" tags itself, it can allocate more resources to parsing the HTML, and drawing the screen faster.

    • Look at Apache::GZIP. I've had no experience with using this module, but I hear it can increase download speeds for images and HTML. (even dynamically generated HTML) Please be aware that it could probably cause performance issues on your web server, since it needs to do alot of extra work compressing things on the fly. It's up to you to decide if the extra speed for your users is worth the trade off.

    • Try using HTML::Clean to filter out any extras, such as excess whitespace. You can sometimes compress your output by a further 10-20% using this module. Use this module with caution, it is quite agressive with it's cleaning. I would suggest a lower level of optimization rather than full, as it's been known to play havoc with javascript.

    Whew! Sorry for the length of this post. Once I started writing I couldn't stop. Perhaps I should put this into a tutorial.

    Anyone have other performance improvement suggestions?

      Very nice, and all good advice.

      To this I would like to add a couple of more relevant links off of my home page. First of all Code Complete has a sample chapter online on Optimization that is worth reading. Secondly those who want to get into great depth on how to design web-servers and network protocols for maximum performance may find many things of interest in The C10K Problem. Another good reference is this rant on Latency vs bandwidth. Many slow applications are slow because of heavy duty interactions across something with bad latency, and not because of lack of bandwidth.

Re: Speed of Template
by PsychoSpunk (Hermit) on Feb 09, 2001 at 10:26 UTC
    Well, have you attempted to determine where the extra overhead seems to be coming from? I don't see why fetchrow_hashref would be killing you, unless your data set is so huge that, during the fetch operations, you effectively slow to a crawl. This is doubtful, however.

    Have you tried comparing the results of the same calls using fetchrow_array?

    Template is a great tool, but don't buy into it if it's simply because you read so-and-so say that it's the best thing he's ever used. Buy into it if it's going to make your life easier.

    That said, I bought. I'm trying it out on a new project at work, and I am pretty pleased with the results. And yes, I'm using the DBI Plugin that comes with it. I found that I was able to wipe out my server when I attempted to do some 250+ SQL commands in one script, but I guess that's to be expected. :) (don't ask, I was just playing around with something to find out if it would)

    Is there a marked improvement? I doubt it, at least if you're currently running mod_perl and your scripts use Apache::DBI or even plain ol' DBI. But coming from straight use CGI; and use DBI; stuff, I see improvement.

    Update: yes, chromatic has a good point from the DBI book, but I'm going to guess that the point of that statement in the DBI book is related to anyone who wonders why the server mysteriously locks while pulling thousands of rows back for data. (Once again, it was just to see what my limits were :) Jeez.)

    ALL HAIL BRAK!!!

      Right now, in regards to fetchrow_array vs fetchrow_hashset, I've found that in the past versions of my code, I've been burned by using _array; when I did this part of the db query, I'd want the entire row of data, so I'd just select *, and when down the road the db changed because I added another column or the like, I had to go back and account for this aspect at all times. (I did need only specific elements for a quick lookup, but I put all of these into a utility module for the CGI). With how I've got it now, there's only one place where I need to call this key SQL statement, and thus I might be able to clean up the code enough to use _arrry vs _hashset.

      The other reason that I'd want _hashset is that I'd like to toss some more things into the hash that I get back from it after the db, and as such, I can just take that hash ref from this call and not worrying about creating a new hash until the next row.

      But again, it comes down to the fact that running a query with 1-5 results seems to take as long as a 1000 result one based on how my code is called, so I very much doubt the DB is the bottle neck, and instead Template being the problem. I've made sure to put the template calls outside of the loop (that is, make an array of the hashrefs from above, then let Template handle the for-next'ing it). From chormatic's post, it also suggests that caching of the templates should be going on if the envirnoment is persistent -- while I've folled the mod_perl docs and the like to modify the Apache conf, I don't have an easy way to test if I have created the right persistent environment.

        As far as getting the column titles out correctly goes, I'd suggest something along the lines of putting as much of your DBI stuff in a single module (an OO one if you like) as possible. That allows you to 'encapsulate' (love that buzzword) your database code. In that module, or elsewhere, you could use the constant pragma to alias column numbers to memorable tags. You'd only have one place you'd have to change, and you'd get the speed advantages of fetchrow_array over fetchrow_hash.

        e.g. :

        use constant NAME => 2; use constant ID => 0; # ... my @row = $sth->fetchrow_array; print "Hi, Bob!\n" if $row[NAME] = 'Robert';

        Or some such. Just a thought.

        Philosophy can be made out of anything. Or less -- Jerry A. Fodor

Re: Speed of Template
by perrin (Chancellor) on Aug 15, 2001 at 10:23 UTC
    I know it's late to comment on this thread, but I wanted to point out one thing about using TT in mod_perl and other persistent environments: you must cache the Template object! If you don't, and you create a new instance of Template on each request, you'll be killing the internal template cache. Read the tutorial on the TT site for an example of how to do it.
Re: Speed of Template
by Anonymous Monk on Feb 10, 2001 at 07:39 UTC
    If the queries I'm working with are going to be fairly static, I'll generally trade in my fetchrow_hashref calls for pseudo-hashes created with fetchrow_array instead.

    It generally ends up looking something like this:

    my $order = { col_one => 1, col_two => 2 }; my $sth = $dbh->prepare("SELECT col_one,col_two FROM table"); $sth->execute(); my $row = [$order, $sth->fetchrow_array()]; print "<LI>This is the data for col_one: $row->{col_one}</LI>";

    Or something like that, the only downside being: if I change the SQL query in my dbh->prepare() call, I'll have to make the corresponding changes to the order hashref above. This saves the DBI overhead of having to check the column names for each query, plus you get the benefits (IMO, at least) of pseudo-hashes over plain hashrefs -- wether or not you like them is up to you, here is one small discussion on the matter: Are pseudo-hashes worth the effort?

Re: Speed of Template
by EvilTypeGuy (Initiate) on Feb 10, 2001 at 02:38 UTC
    Fetchrow hashref is pretty slow. Consider doing whatever structures you need yourself :) The DBI maintainer hasn't optimized fetchrow_hashref yet, which is one big reason why you shouldn't use it if you're concerned about speed. Also, if your site is mostly static (i.e. you only have a few dynamic pages) consider not using mod_perl. use something like fastcgi instead. mod_perl has a big memory/processor overhead because it launchers a new interpreter for every new httpd process. fastcgi takes a round robin approach where it has a pool of already compiled copies of your script where it connects a user to one of those copies in a never-ending circle...(is that confusing, I think so...:)
FOLLOWUP
by Masem (Monsignor) on Feb 14, 2001 at 23:34 UTC
    As I've been using some of the ideas here and on the web for what I'm doing, I'd thought I'd share them here.

    First, I did find that SQL queries were slowing me down : I was trying to be cute with INNER JOINS, but after recognizing that key searches that I'd want to do would require 2 or 3 inner joins to do it, and this ate up alot of time. I figured that most of the data that the frequent searches will be done with is static except when *I* change it (infrequent or as necessary), that I'd create a 'cached' column of data of what I'd normally get out of the JOINs, usually as a /;/ list, so that I can then use REGEXP to get the data that I need instead. I can then use secondary SQL calls to get any additional data out of the table. It's arguable that these secondary calls may increase runtime, but it increases the modularity of the CGI/perl stuff, since I use similar calls at other non-searching parts of the site. (As another optimization point, I would think that if I kept around a global variable for the $sth for these utility functions, since the query format doesn't change, only one placeholder, I'd get a bit more optimization... must try that tonight... :D)

    Template is *wonderful* to work with. It allows me to have minimal HTML code, even through CGI.pm, in the perl code, and thus provides yet another degree of model-controller-view. And yes, once I worked out the SQL problems, it was sufficient fast. Plus, Template has enough logic possibilities built into it that I can use more general results as outputs from my modular perl functions (refs to arrays and hashes as opposed to strings) and decide how to deal with them when I tighten down the presentation issues of my site. For example, one of the results of the faux JOIN above is a list of id number, each which, in a separate table, has a name; these can lead to individual pages which can be linked too, however, I may not always want to link to such in my code. I *could* write two functions in perl, one for the non-linked list, one for the linked list; I could supply the extra parameter in the function and handle it as such, but I think the most generic way to go is to return a hash, as I can still grab the list of names via a join(values()). But since Template can iterate through hashes, I'll just pass it to the template file, and then handle it there. Oh, but the possibilities..!

    I'm glad that the suggestions here made me stick with it -- the time of development right now is much faster than if I stuck to only perl and CGI for code generation. I'd be happy to share specific details of my experiences if anyone is in a similar position.