Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

Code Efficiency: Trial and Error?

by Tanalis (Curate)
on Oct 11, 2002 at 19:40 UTC ( [id://204640]=perlmeditation: print w/replies, xml ) Need Help??

I write a large amount of code to handle stupily large amounts of data at work, and I find myself spending a lot of my time while designing perl scripts thinking not only about how to do something, but getting bogged down in the quagmire of "how best to do something".

This largely (at the moment) comes down to trial and error for me - which is *very* frustrating - hence this Meditation .. me trying to put thoughts down on paper - scary thought .. :P I'm sure this is something that comes up a lot .. but I thought I'd post it anyway, and see what comes of it.

Perl is flexible. Very flexible. To the point that, for any given task, you can all but guarantee that there's a completely different, and probably better, way to do it.

This is one of Perl's best features - but also one of it's most frustrating. Even with the most carefully crafted code, I find myself wondering if I could have done it some other way to shave valuable minutes off of processing the million-or-so data records coming out of the database.

Bizarrely, to me at least, coming from a C background, where there's rarely more than one "recommended" way to do something, few of the "good" Perl books seem to touch on code effiency, barring a quick "there's more than one way to do something" comment in the introductory paragraph/chapter of the book.

Surely for something this fundamental - runtimes of scripts can be multipled many times with a less "efficient" method of processing - some guidelines should exist - even if it were just a quick "tip-sheet" to give you a helping hand when figuring out how not to write a script.

I can see that there's really no restrictions on how code is written - and it's largely a matter of personal style, preference and experience ..

What I'm interested to know is ...
- how do other Monks go about optimising their code?
- when faced with more than one obvious way to do something, what's the determining factor?
- surely this has to be more than "trial and error" ..? *grin*

Anyway .. enough of my thoughts/contemplations/moaning .. just thought I'd see what people thought.
--Foxcub

Replies are listed 'Best First'.
Re: Code Efficiency: Trial and Error?
by dws (Chancellor) on Oct 11, 2002 at 20:32 UTC
    What I'm interested to know is ...
    - how do other Monks go about optimising their code?

    Until I know that there's really a problem, I don't.

    If there is a problem, then measure before making any code changes. Sometimes the optimization is where you think it is, but much of the time it won't be. Then measure again.

    - when faced with more than one obvious way to do something, what's the determining factor?

    Pick the way that's clearer. Code has multiple audiences: one audience is the computer, another is the person who picks up the code next. I try to make sure things are correct for the former, and clear and concise for the latter. If you write all of your code with the assumption that it might be tacked up on a wall for passers by to comment on, you won't go too wrong.

Re: Code Efficiency: Trial and Error?
by runrig (Abbot) on Oct 11, 2002 at 20:54 UTC

    - how do other Monks go about optimising their code?
    - when faced with more than one obvious way to do something, what's the determining factor?
    - surely this has to be more than "trial and error" ..? *grin*

    When optimizing, you have to ask yourself "How much time will this really save?", and weigh that against "How maintainable is each method?".

    And then there's "How much time do I have to play with this?" :-)

    I was recently trying to optimize some homegrown Korn shell menu scripts of ours where at one point the user had to wait for about 10 seconds for a prompt (I did end up getting it down to 2-3 seconds). There was one function which was looking for the alias (filename) for a certain customer code (last field on a certain line). The function was doing this:

    grep " $code$" * | awk -F: '{print $1}'
    So it was grep'ing through all the files in a directory even though it could quit when it hit the first matching line. So I rewrote it as this (also realizing that the target line always started with 'Customer Code'):
    awk '/^Customer Code.* '$code'$/{print FILENAME; exit}' *
    But when you hit the 'Customer Code' line in one file, and it's not the one you want, you'd like to close the file right there and move on to the next file, especially because the 'Customer Code' line was always the second or third line in a 40-100 line file. gawk has a 'nextfile' function which does this, but I'm stuck with awk for now. So let's try perl:
    perl -e ' $site=shift; opendir DIR, "."; @ARGV = readdir DIR; closedir DIR; while(<>) { if (/^Customer Code.* (\w+)$/) { print("$ARGV\n"),exit if $site eq $1; close ARGV; } }' $code
    This on the average, goes twice as fast as the original, but at the cost of readability (especially since no one else here knows perl all that well). And then it turns out that this function was not even called during that particularly slow prompt, and was only being called once per execution (in another place), so I'd be saving a whole 0.03 seconds (which the user wouldn't even notice) by doing this with perl. But I'm leaving the perl in for now, along with the old line commented out, with a comment to the effect of "it's alot more fun doing it this way" :-)

    Update: As a (hopefully) final note, even though the above code wasn't slowing down the especially slow prompt, I did finally speed up that slow prompt to being almost instantaneous by replacing the offending section with perl. The problem was that there were about 90 'customer' files, and it was fork/exec'ing grep/awk/etc for each file, so I just read each file, saved what I needed in a hash array, and printed the thing out at the end:

    site_list=$( perl -e ' while (<>) { if ($ARGV eq "/etc/uucp/Systems") { $system{$1} = undef if /^(\w+)/; next; } close ARGV, next unless exists $system{$_}; $system{$ARGV} = $1, close ARGV if /^Customer Code.*\s(\w+)\s*$/ +; } $,=" "; print sort(values %system),"\n"; ' /etc/uucp/Systems *)
    So some things are worth optimizing for. It saves only about 3 seconds (10 from the original) in actual time, but the annoyance it saves is priceless :-)
      I wouldn't have used perl or awk, but stayed with grep: grep -l would have done the same as your Perl script. But likely to be more efficient.

      Abigail

        I like the simplicity of grep -l though as Aristotle points out, it still scans all files (though its probably what I'll end up with just for the sake of maintenance, and '-l' short circuiting the match within each file is 'good enough'). If I just look for /_$code$/ then it is about as fast as the perl script when all the files need to be scanned anyway (and perl isn't all that much quicker even when the match occurs within the first few files). But when I change it to "^Customer Code.* $code$" then it is (~3x) slower. grep and sed are good at very simple regexes, but perl seems to outperform them when they become even mildly complex.

      Update { should have benchmarked properly, apparently.. rest of this post largely invalidated by runrig's reply. }

      Abigail-II's proposition of grep -l does not fit your specs as it will still scan all files, but your Perl can be simplified.

      perl -e ' $site = shift; for $f (@ARGV) { local @ARGV = $f; /^Customer Code/ && last while <>; / \Q$site\E$/ && (print("$f\n"), last); } ' $code *
      But why all that? A shortcircuiting sed script to find a customer's code: sed 's/^Customer Code[^ ] //; t done; d; : done; q;' FILE Wrap some sh around it and it does the job:
      for ALIAS in * ; do [ "`sed 's/^Customer Code[^ ]* //; t done; d; : done; q' "$ALIAS"` +" = "$code" ] && break done echo "$ALIAS"
      (Wrote this in bash, not sure if it works 1:1 in Korn, but it should be easy to port anyway.)

      Update: Btw: when you're passing {print $?} to awk, it's a sign you really wanted to use cut - in your case, that would be cut -d: -f1

      Makeshifts last the longest.

        I didn't do the file globbing outside of the perl script because that seemed to be slower, and a readdir solution seemed to be faster than a glob inside of the script. I corrected and rewrote your shell solution for Korn:
        for ALIAS in * do [[ $(sed -n 's/^Customer Code.* // t done d : done p q' "$ALIAS") = $code ]] && break done print $ALIAS
        But this seems to be the slowest solution of all. Probably due to having to fire up a sed process so many times, and maybe also due to the specific regex. But like I say in my reply to Abigail-II's post above, I'll probably end up with grep -l just for the simplicity of it.

        Update: In addition to Aristotle's updated note's, another sign is that if you pipe grep to or from sed, awk, & maybe even cut, you are probably doing too much work, and may be better off just using one of the aforementioned commands instead.

Re: Code Efficiency: Trial and Error?
by JaWi (Hermit) on Oct 11, 2002 at 20:38 UTC
    I asked a similar question a while ago: the thread resulted in rather interresting and useful information. I've enprinted most answers in my mind, but I think jeffa's answer was the most useful: "do not optimize until you really, really need to".

    Well, hope it helps...

    -- JaWi

    "A chicken is an egg's way of producing more eggs."

Re: Code Efficiency: Trial and Error?
by Aristotle (Chancellor) on Oct 12, 2002 at 01:32 UTC
    Three rules govern the way I write my code:
    1. Pick the best / most appropriate algorithm
    2. Come up with an natural representation using Perl's data types
    3. Write code that's as clear and self-explanatory as possible

    Note that "clear and self-explanatory" doesn't mean "clear and self-explanatory to someone who's never seen Perl before".

    I make use of common "magic" features like $_ (which is only the simplest example) extensively. I use next / last to avoid indentations. I use statement modifiers a ton. I use the ternary operator quite frequently.

    But not to shave a couple of characters off: to shave a couple of red tape tokens off, and thus baring the real goings-on to the eye. All my indentation and other style rules follow this maxime. I try to lay out my code such that I can absorb the structure while scrolling without even really reading.

    Makeshifts last the longest.

Re: Code Efficiency: Trial and Error?
by oakbox (Chaplain) on Oct 11, 2002 at 20:57 UTC
    I find that I stick to a few simple constructs: foreach over for or while, if-else in brackets over the ?: notation, etc. I write code faster when I'm using tools that I am very familiar with and only jump to other coding styles when there is a real need for optimization.

    The main criteria is consistency. If you are consistent in your style, you can go back to your own code and immediately grok what's going on. Consistency also helps other people reading you code.

    The question wasn't 'correct vs. bad' coding. If you are consistently writing bad code, then you need to change :) But if the choice is between 'good' and 'also good' consistency wins.

    oakbox

      Well... I often ask myself the same question. I often do things in a very readable way, that works well, no bugs etc... And then my boss will come, hack the code, make it two times shorter, and result in a faster more efficient way. Often not as readable though.
      I also often wonder when building regexes if Im using the fastest way. It would be nice to have examples of two regexes doing the same thing but one faster than the other, and explain why it is faster, so that we can understand how to write more efficient code.
        you might want to check Mastering Regular Expressions by Jeff E.F.Friedl for a deep coverage of regular expressions, with a whole chapter dedicated to perl's regexes.

        btw, that book review is for the first edition, so you might want to know there is a second edition published in july 2002.

        hope this helps,

Re: Code Efficiency: Trial and Error?
by ignatz (Vicar) on Oct 11, 2002 at 20:34 UTC
    How do you get to Carnegie Hall?
    ()-()
     \"/
      `                                                     
    
Re: Code Efficiency: Trial and Error?
by trs80 (Priest) on Oct 12, 2002 at 16:04 UTC
    Q1) how do other Monks go about optimizing their code?
    Q2) when faced with more than one obvious way to do something, what's the determining factor?
    Q3) surely this has to be more than "trial and error" ..? *grin*

    A1) Since reading between the lines it seems you are working with a database most of the time, there is much more to getting efficient code then just the Perl part of it. You need to be concerned with the network (if it is a remote database), the indexing of the tables, this can make a big difference on the speed of your code. In one recent project I got a ten fold difference in speed on a report by adding the appropriate indexes to the tables. As far as Perl efficiency is concerned this also very dependent on the type of data you are dealing with. I recommend you use a profiler to find out where the most time is being spent in you code. You might want to checkout Devel::AutoProfiler .

    A2) For me unless there is a known efficiency issue I code to my level as much as possible. I occasionally will use code outside of my comfort zone in the hopes of adding it at some point to my bag of tricks, but some code just doesn't "sound" right and makes more of a maintenance issue then anything else. Another thing to consider is Lifespan of the code in question. If this is a one time or seldom run piece of code you most likely won't recover from the time "wasted" in making more efficient. Optimize only when it is beneficial in the grand scheme of things.

    A3) Using different benchmarking tools and code profiling you can determine what is causing the bottleneck and code those sections differently.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlmeditation [id://204640]
Approved by mikeirw
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others imbibing at the Monastery: (6)
As of 2024-04-16 07:17 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found