Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight

Premature optimization

by Ovid (Cardinal)
on Jan 03, 2002 at 03:45 UTC ( [id://135852] : perlmeditation . print w/replies, xml ) Need Help??

There's been a lot of recent discussion regarding premature optimization and I though I should post a node clarifying why premature optimization is a bad thing. Much of this node is from a response I made to the Perl Beginner's mailing list earlier today. Go there, sign up for the mailing lists and help people develop warm, cuddly feelings about Perl.

There's a saying that there's only one thing you need to know about project management: "fast, good, or cheap, pick two". These three variables refer to development time (not speed!), quality, and cost. They assume a fourth variable: scope. In any given development environment, you can usually control about three of these variables. Examining the impact on the fourth allows you to determine what's an appropriate response to what you are doing. Remember those variables:

  1. Scope
  2. Cost
  3. Quality
  4. Development Speed

Now, obviously no one wants to sacrifice quality (though, in practice, due to poor planning, that's one of the first things to get sacrificed), so let's mentally cross that off the list. If your clients are really picky, scope is also difficult to sacrifice. So, the cost of the project and the development speed (cheap and fast) are often what gets left over. Cost is often affected by things other than programmer hours (licenses, travel, hardware, etc), but it's closely tied to it. As a result, the faster the development speed, the lower the cost tends to be. I guarantee that for any given application, competant Perl programmers will kick the snuff out of competant C programmers in terms of development speed (this was originally a response to someone asking about the speed of C vs. Perl).

Note that nowhere in there did I say anything about "how fast the program runs". If it runs fast enough for your clients, it's fast enough. Period. If I tell a client, "oh, I can do this project in half the cost in a third of the time, but your program will run a bit slower", many, if not most clients will opt to save the money.

Of course, I wouldn't write device drivers in Perl and currently, I'm working with Inline::C to learn how to optimize certain portions of my programs (Inline::C is embedding C in Perl), but dollar for dollar, our clients love Perl and would rather fight than switch.

This is not to say that the speed of an application is irrelevant. I've used desktop programs written in Java that were very powerful, fit my needs perfectly, and couldn't outrun a quadraplegic snail (whatever that means). As a result, I don't use those applications. That still doesn't mean that you optimize for speed up front. You optimize for clarity. Clear understand code is easier to optimize for speed than is code that has been obfuscated by premature optimizations.

So how do you optimize? You start doing that near the end of a project when you have a clearer idea of your overall project and the various pieces interact. Plus, as best as possible, you optimize with real-world conditions. If you're testing against a 100 line dataset, but your actual datasets are millions of lines, you're really not testing. Many Web developers make big, beautiful apps that are lightning fast -- but only when run locally.

Once you have near real-world conditions for your nearly complete code, then you can start using Devel::Dprof to find your worst offenders and Devel::SmallProf to profie things line-by-line.

A useful bit of information about this is Re (tilly) 1: Why Use Perl?.


Join the Perlmonks Setiathome Group or just click on the the link and check out our stats.

Replies are listed 'Best First'.
Re: Premature optimization
by clintp (Curate) on Jan 03, 2002 at 04:10 UTC
    I've acted as project manager (and lead developer) for quite a few projects that were user-oriented. Some were web apps, some were client/server database query apps, etc.. All tend to be multiperson teams with hundreds of thousands of lines of code, and development time spanning months.

    Usually a rough working prototype is constructed -- even if it's just a first pass at one of the screens. Slapped together in a fraction of the time it takes to complete the app, I try to get the performance to within half an order of magnitude that the final application needs to perform.

    If my team can get the app to work in 5 seconds where a 1-second response time is needed -- that's great. We discuss the potential points of optimization (there's always some) and then we start to code using that design without the optimizations. The person assigned to the code where the handwaving took place is given some idea of what will be done ("this will be memoized/cached", "just code this query the way it is, we've got something better later", "we've got a clever hack for this, don't sweat it now") but is told to code it in the most straightforward way possible for debugging and unit testing.

    Near the end of a project, sure we'll go back and tweak the application to get every ounce of speed. By then the hardware's been upgraded, the libraries have improved, we've found the flaws in the design, and most of the code's been debugged. But in the beginning, this is generally what I do and it hasn't failed yet.

Re: Premature optimization
by dws (Chancellor) on Jan 03, 2002 at 04:53 UTC
    Another way to look at this is to consider the question
    What is a timely (i.e., non-premature) optimization?
    There are several good answers to this. Some answers center around comparative opportunity costs (e.g., is the value per expended time for implementing this optimization greater than I would get from doing something else?). But all answers that I know of relate in some way to value, and that's were we go awry.

    In terms of the criteria that Ovid lays out

    1. An optimization can have negative value if it causes us to decrease scope (e.g., by burning up our time budget before we finish required features).
    2. An optimization can have negative value if it increases project cost in excess of the value of the optimization. (If the customer isn't willing to pay extra for an optimization, it probably isn't worth doing.)
    3. An optimization can have negative value if it decreases product quality (e.g., by making the optimized area harder to maintain).
    4. An optimization can have a negative impact on development speed.
    All of which speak to the need to be judicious when choosing where and when to optimize.

Re: Premature optimization
by gav^ (Curate) on Jan 03, 2002 at 06:47 UTC
    I agree wholeheartedly. The main reason I program in Perl is that it helps me get the job done quickly. My clients love that I can get changes done so quickly. One part of this is the flexibility of the language, but CPAN and the excellent modules availible is the real key.

    A case in point, client rings up 'Can you extract the email address of everyone from our POP3 box?'. No problem! 15 mins later with the help of POP3Client I had the answer. Few hours go by and then they ask, 'We downloaded them, can you do it again from this mailbox file?'. Another 15 mins later with the help of MBoxParser I had one happy customer.

    I hate to think how long it would take me in another language. I could have been banging my head against the wall for ages!

    The one bad thing I would have to say about perl is that it lets you get away with a lot. Turn off strict and warnings and you can get away with a lot of sloppy code.

    There is some shocking CGI scripts about, everytime I go to look for something (yesterday it was a simple web log) I poke around the code and I'm shocked. I found Graymatter which seems to be well recieved. What do I find?

    • No strict or warnings
    • Not using
    • Function calls with a & in front (one of my pet peeves - I think it looks terrible)
    • A horrid stack of if statements, no thought of using some sort of hash as a dispatch table or anything more maintainable
    • HTML scattered through the script - with excellent templates modules such as HTML::Template (my favorite) that are so easy to use, there is no excuse
    And that's just a quick look! I hate to think what is wrong under the hood.

    This is just one example of bad coding. I think the main problem is that perl makes it so easy, some people don't think to do it right. Maybe I'm being pedantic here but I take pride in neat well written code :)

    One of our clients sent over their desiger to look at some of the cgi scripts I worked on for a project. They wanted him to tidy up the HTML output and fit it into their "look". I handed over the templates he'd need to change and he was shocked. It was the first time he'd seen a perl cgi script that didn't have embedded HTML!

    Well that was a bit of a rambly post but I'm sure I have a point somewhere. Sit back, think about what you are doing and try to do it right :)

(tye)Re: Premature optimization
by tye (Sage) on Jan 04, 2002 at 00:41 UTC

    I hear lots of "speed" talk here and it is almost always premature nano-optimization. You describe problems with premature optimization. I think the nano-optimization and micro-optimization is perhaps a worse problem for many Perl programmers.

    I marvel at people (including me, sometimes) running benchmarks to estimate how much faster single quotes are to double quotes, whether for is faster before a loop or after a statement, whether && is faster than and, whether arrays are faster than hashes, whether globals are faster than references, etc.

    The most common conclusion is that X is 20% faster than Y (seriously). And that usually means your script can be changed from taking 1.2 seconds to run to an impressive 1.1999 seconds to run!

    If a script is running "too slow", then optimizing simple things is very unlikely to make it run "fast enough". On rare occasions, you'll find a fairly simple thing that is getting done many thousands of times such that making it significantly faster will make your while program only moderately faster.

    But don't assume you have such a rare case unless you have some decent evidence. That is what things like Devel::Dprof are meant to help you figure out. So work on figuring out what the bottleneck is before you spend effort trying to remove it.

    But, if a script is running "too slow", then your only reasonable hope for fixing that is to make changes to the global structure, to the algorithm. Instead of making one operation 20% faster, make it so you process/store the data in a different format/order so that you can find what you want with one operation instead with 10,000, for example.

            - tye (but my friends call me "Tye")

      My usual procedure for optimizing something is:

      1. Profile it, to find out which part of the problem is taking the most time. (Sometimes the problem's a single algorithm, in which case this isn't really necessary.)
      2. Estimate the asymptotic bound of that bit of the problem. (Yet another excellent real-world use for those icky awful theory courses you were forced to take in university. :-)
      3. Try to make the algorithm faster. Sometimes this just isn't possible; IIRC, you can't get a general sort on a single-processor machine faster than O(n log n). But you can often simplify the problem: put a few restrictions on what you sort, and you can get down to O(n). Nobody knows how to solve the travelling salesman problem in less than exponential time, but if you don't need a perfect solution, just one no worse than twice as bad as optimal, you can do it in polynomial (cubic, IIRC) time.
      4. Repeat until you run out of relevant sub-problems.
      5. If it still isn't fast enough, profile it again; this time, look for lines and functions that get executed most often, and try to slim those down. If your program's data set is particularly small, this might be more important than step 2; then again, if n is small, your program's probably "fast enough".
Re: Premature optimization
by belg4mit (Prior) on Jan 21, 2002 at 04:53 UTC
    Just as a point of reference a four-variable system for projects seems to usually be:
    1. Schedule (development speed) 2. Performance (quality) 3. Cost (cost) 4. Risk
    The list reminded me of a paper I wrote on NASA's "Bigger, Better, Cheaper", (They had their own problem with that in they implicitly excepted greater risk ;-)

    I suppose it might be possible (in some convuluted way) to map risk to scope ;-)

    perl -pe "s/\b;([st])/'\1/mg"

Re: Premature optimization
by kwoff (Friar) on Jan 03, 2002 at 23:03 UTC
    Regarding cost, for a large project if you spend very much time optimizing, in that time you (or client) maybe have spent enough on paying developers that you could've bought really fast hardware instead. Also, "near the end of a project" might not be a well-defined time if maintenance is considered. Optimization could make future maintenance harder and cost time down the road.