Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

Optimizing a web application for the best performance

by freakingwildchild (Scribe)
on May 03, 2007 at 07:21 UTC ( [id://613320]=perlquestion: print w/replies, xml ) Need Help??

freakingwildchild has asked for the wisdom of the Perl Monks concerning the following question:

In reference to why code can be so slow? I am creating this seperate node to get the most performance out of your (new) CGI script.

I have received many replies from the Perlmonks that were of value where to start optimizing that slow script. Still, there are a lot of unanswered questions where code should (not) be optimized, ... This node will be dealing with those questions and experiences I get of having a script which is merely giving 1 hit/sec to a normal amount of hits which should be atleast 50 times more. I will try to keep this node up-to-date as possible to help other monks with the same problem of having speed-i-gitis.

I have received the following hints to optimize a script to be getting better performance than before:

Development tools: (jbert)

Ways to improve speed:
  • Do not use non-persistent CGI's (snowhare)
    they are slow especially when you use a lot of modules they can be the timesucker of your system.
    For me they are using CPU currently because of all the pre-loading.
  • Use mod_perl (scorpio17)
    This is probably one of the better upgrades to be done for perlscripts to be run. Modules get loaded once in memory and stay there till you use them; no more pre-loading, swapping and extensive memory/cpu usage because of that.
  • Optimize your algorithms (halley)
    Optimizing your algorithms and functions is the fastest way to go to a fastest script. Which I used to do 2 years ago is already superseeded by faster ways because of the never ending learning curve;
  • Search for alternative and faster ways
    Lots of these methods can already be used in a matter of seconds by browsing at cpan to find your module to get your favorite thing done. Most of the authors on these modules get feedback from their users and put development time to get their modules not only bug free but also faster, snappier and up-to-date to the newest technologies..
  • Use Apache::Registry (moron)
    if you got non mod_perl mode to be using these nice features of mod_perl
  • Use Apache2 (talexb)
    Although, I'm pretty sceptical about this, I've been testing with Apache2 myself on a single core CPU, multi user systems, Apache2 is significantly slower even when optimized. I still wonder if this would increase my overal web and script performance or rather decrease? It made me go back to Apache v1.3; I got to note this is more than a year and a half ago. Anyone has experience on this ?
Unanswered questions:
  • Are there any references available which to look for when writing optimized (web) perl applications?
  • I've been thinking "use" and "no" would be of use inside a subroutine to save memory at that time but found out the hard way the "no" did nothing at all to the memory. Probably using "no" will be even a performance hit everywhere over my scripts?
  • Is there any speed difference in using if $blah { &dothis } or &dothis if $blah?
  • Keeping security in mind, lots of parsing has to be done to protect input from being malicious. My way of parsing input has always been "passive"; I will not create an error message with "wrong input" (except for hard-core input errors) but rather filter out the bad input and get the rest to the script; are there any good modules filtering out bad input?
  • Does it help to have a dual core or quad cpu core for perlscripts?
  • Which is fastest? RDBMS (sql) with DBI, DBD or DBM? BerkeleyDB3 or 2? ..
  • Which is best whenever you got to access many files through your script; (ex: gathering metadata from XML files) all automatically? Use a database to cache the files on disk next time they got used? are there methods for this?
  • Sometimes you need to get one value, for example a MD5 Hash; with non-persistant cgi's (as I understand) once you load the module it cannot be unload again and it will stay in memory; are there any ways to get such value anyways without using the memory overhead?
  • When loading a module, does it load the -entire- module or do I only load the code which needed?
  • Anyone else has questions to add? comments? links? experiences with above (or even more) problems which to watch out for?
Probably a lot more of questions/hints and tips will be added to this list; I hope to help the web developers around here with a problem which apparantly strikes us all; which probably also can be avoided by having the know-how of which to use, what to (not) do and especially what to watch out for.

Many thanks already to the monks for the contribution to the previous SOPW.

Replies are listed 'Best First'.
Re: Optimizing a web application for the best performance
by Joost (Canon) on May 03, 2007 at 08:26 UTC
    * Are there any references available which to look for when writing optimized (web) perl applications?

    Take a look at the mod_perl performance tuning document.

    * I've been thinking "use" and "no" would be of use inside a subroutine to save memory at that time but found out the hard way the "no" did nothing at all to the memory. Probably using "no" will be even a performance hit everywhere over my scripts?

    use() will load a module (if not loaded already) and then call PackageName->import(). no() will load a module (if not loaded already) and call PackageName->unimport(). You can probably assume no() and use() are the same in terms of CPU and memory use.

    * Is there any speed difference in using if $blah { &dothis } or &dothis if $blah?

    If there is, it's minuscule. If you need to worry about it, use XS/C instead.

    * Keeping security in mind, lots of parsing has to be done to protect input from being malicious. My way of parsing input has always been "passive"; I will not create an error message with "wrong input" (except for hard-core input errors) but rather filter out the bad input and get the rest to the script; are there any good modules filtering out bad input?

    Don't filter out bad input. Only accept good input or make sure that the input doesn't matter (security wise). Filtering out bad input is usually by far the hardest to do and takes the most time.

    * Does it help to have a dual core or quad cpu core for perlscripts?

    If you run more than one process at the same time, yes it helps.

    * Which is fastest? RDBMS (sql) with DBI, DBD or DBM? BerkeleyDB3 or 2? ..

    They're not the same. If you need a RDBM, and RDBM is a better option than a simple DB like BerkelyDB. You can probably assume BerkelyDB etc are faster than RDBMs if you limit yourself to the berkelydb functionality.

    * Which is best whenever you got to access many files through your script; (ex: gathering metadata from XML files) all automatically? Use a database to cache the files on disk next time they got used? are there methods for this?

    Accessing files directly is probably faster than reading them from a DB (assuming the files are reasonably large). Parsing the files might take most of your time. If you can, keep the parsed structure in memory. If not, it might be useful to store the parsed perl structure in a file using Data::Dumper or Storable or something similar.

    * Sometimes you need to get one value, for example a MD5 Hash; with non-persistant cgi's (as I understand) once you load the module it cannot be unload again and it will stay in memory; are there any ways to get such value anyways without using the memory overhead?

    You can use Symbol::delete_package(), but I'm not sure that will give you back much memory. And you probably shouldn't bother. For most systems, the memory overhead from the loaded code is dwarfed by the memory overhead from the used data anyway.

    * When loading a module, does it load the -entire- module or do I only load the code which needed?

    Depends on the module. Most modules probably load everything, but at least some large standard modules (like POSIX and, i believe, CGI) load on demand.

Re: Optimizing a web application for the best performance
by graq (Curate) on May 03, 2007 at 08:08 UTC
    "Although, I'm pretty sceptical about this, I've been testing with Apache2 myself on a single core CPU, multi user systems, Apache2 is significantly slower even when optimized. I still wonder if this would increase my overal web and script performance or rather decrease? It made me go back to Apache v1.3; I got to note this is more than a year and a half ago. Anyone has experience on this ? "

    18 months is a long time these days. I would suggest that you retry your tests as I was convinced into using Apache2 by an ex colleague of mine who said much the same as you have, but followed it with (and I paraphrase) "We now have Apache2 running significantly quicker"

    From my experience (which does not compare like for like, so it is somewhat subjective) Apache2 with mod_perl2 is quicker and more robust than previous versions.

    -=( Graq )=-

Re: Optimizing a web application for the best performance
by ides (Deacon) on May 03, 2007 at 14:51 UTC

    You're thinking about memory use, at least in a mod_perl environment, incorrectly. What you should do is preload all of the modules you use in a startup.pl, in a <Perl> section, or if you really like typing a lot with PerlModule directives in your httpd.conf.

    This loads the module into the server at startup and then can share the memory used by the module with all of the Apache children, and thus your CGIs and mod_perl handlers.

    So for example say My::Foo module uses up 200k of memory. If you simply use My::Foo; in your programs then you will use 200k per Apache child process. However, if you preload it then you will use 200k in total no matter how many children are currently in use.

    NOTE: This isn't a perfect representation, as you are only sharing the module's code itself and not any of the data structures it creates at run time. But this is how you should be thinking about it.

    Frank Wiles <frank@revsys.com>
    www.revsys.com

Re: Optimizing a web application for the best performance
by spatterson (Pilgrim) on May 03, 2007 at 15:32 UTC
    I've been thinking "use" and "no" would be of use inside a subroutine to save memory at that time but found out the hard way the "no" did nothing at all to the memory. Probably using "no" will be even a performance hit everywhere over my scripts?
    'use' lines are evaluated when your script is compiled, before anything else, so even if you place them in a sub, those modules will be loaded when the script is compiled and before that sub is executed.

    You can get round this to only load a module when the sub runs by using require or eval:

    sub foo { require Module; import Module; } sub bar { eval(use Module); }

    just another cpan module author
Re: Optimizing a web application for the best performance
by perrin (Chancellor) on May 03, 2007 at 14:48 UTC
    Rather than asking a lot of somewhat random questions, maybe you'd be better off taking jbert's advice and using Devel::DProf on your script. Then you'll know what to focus on to get the most improvement.
Re: Optimizing a web application for the best performance
by swares (Monk) on May 03, 2007 at 15:50 UTC
    Switching from Apache 1.3.x to Apache 2.0.47 made my Perl scripts display in the browser at least 5x faster. I never timed the actual run time of the scripts under Apache directly.... but from the command line it seemed to only take a second or two for the script to compile and complete execution. Yes, it may be slower with other content but my scripts are much happier.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://613320]
Approved by GrandFather
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others taking refuge in the Monastery: (3)
As of 2024-04-19 22:20 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found