Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

What is faster?

by hotshot (Prior)
on Jun 02, 2003 at 07:32 UTC ( [id://262324]=perlquestion: print w/replies, xml ) Need Help??

hotshot has asked for the wisdom of the Perl Monks concerning the following question:

Hi all!

A little question, I have an output of a script I run from my perl program using open, I need to grep on this script output. My question is, what is faster, calling the script, as follows:
open(OUTPUT, "$script | grep EXPRESSION |"); my @output = <OUTPUT>;
or doing:
open(OUTPUT, "$script |"); my @output = <OUTPUT>; @output = grep { EPRESSION } @output;
In general my question is what is faster, preforming operations using perl or using system, coz' I know a lot of issues are involved (like system opens a shell, etc.).

Thanks

Hotshot

Replies are listed 'Best First'.
Re: What is faster?
by BrowserUk (Patriarch) on Jun 02, 2003 at 08:43 UTC

    One situation where I might use an external grep utility in preference to the internal one is if the script produces large volumes of ouput, and the selection process filters out a large proportion of it. The difference in pure speed terms is likely to be minimal, but the reduction in memory usage by not loading data just to discard it might be worth having.

    That said, the amount of memory used by the internal version could be minimised by applying the grep at input rather than afterwards. Ie.

    Update: DO NOT USE THE CODE BELOW!! Good idea, bad implementation as pointed out below by tilly

    open(OUTPUT, "$script |"); my @output = grep { EXPRESSION } <OUTPUT>;

    You'd probably need to be discarding a significant amount of input for this to make any great difference, but it probably wouldn't harm in any case, so why not do it anyway.


    Examine what is said, not who speaks.
    "Efficiency is intelligent laziness." -David Dunham
    "When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller


      Here, here.. ;)

      Seriously though, I have found the internal grep to suit my needs very nicely when parsing files. I haven't really done much in terms of post processing of other commands. I was honestly amazed at the difference in speed between say
      open(IN, "/some/file") || die "Cant open /some/file: $!\n"; while (<IN>) { next unless (m/^$some_match/); chomp($capture = $_); } close(IN);
      as opposed to
      open(IN, "/some/file") || die "Cant open /some/file: $!\n"; chomp( ($capture) = grep(/^$some_match/, <IN>) ); close(IN);
      Especially as the size of the file being processed increases. Im not sure of the why of it, as I haven't gone poking around Perl's internals, but it certainly increased my regular useage of grep.

      MMMMM... Chocolaty Perl Goodness.....

        The obvious guess as to why grep is faster is that it's implemented in C. Any Perl built-in is going to be faster than the equivalent Perl code because it's compiled down to native machine code (and probably better optimized, to boot). It's the same reason why Perl's built-in lexical sort is faster than giving an explicit comparison routine (and why you're often better off munging the data before-hand so that you can use the built-in lexical sort; update: sometimes known as the Guttman Rosler Transform).

        That said, given your example use of while vs. grep and assuming you're interested in the first or only match (as in the while example), I would like to recommend the oft-overlooked List::Util function first. It's implemented in C, so it should be as fast as grep (or nearly so), and has the advantage of stopping when a match is found. This could potentially save a lot of file IO in your example. It also has the advantage of not building a return list which is just thrown away after getting the first item. List::Util is part of the standard distribution for 5.8.0 and should be an easy install on previous versions, as well. And there's several other useful functions in there as well, not to mention the compainion module Scalar::Util.

        bbfu
        Black flowers blossom
        Fearless on my breath

        l2kashe:

        Your two snippets don't seem to be equivalent. The first (while loop), when run on my dictionary file with the pattern /^zy/, sets $capture to 'zymurgy' (the last match), while the second (grep) sets it to 'zygote' (the first match).

        I understand why the while loop works how it does, perhaps someone could explain why grep works differently?

        TIA, dave

      The theory is right, the application less so.

      The <OUTPUT> construct immediately sucks the whole file into memory. To get the full space savings that you describe (not using variables will get you some...), you need to do something like this:

      open(OUTPUT, "$script |") or die "Cannot run '$script': $!"; my @output; while (<OUTPUT>) { push @output, $_ if /EXPRESSION/; }
      With the emphasis on creative laziness that they keep on talking about for Perl 6, it is possible that Perl 6 will automatically save memory for you with your construct. It is definite that Ruby does. But with Perl 5, you need to write things out longhand.

      UPDATE Zaxo pointed out that I didn't close the diamond properly. Fixed.

        Good point, crap implementation. Thanks for pointing it out.


        Examine what is said, not who speaks.
        "Efficiency is intelligent laziness." -David Dunham
        "When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller


Re: What is faster?
by edoc (Chaplain) on Jun 02, 2003 at 07:49 UTC

    You want to write a benchmarking script to test it..

    *Untested, off the top of my head, code follows..

    #!/usr/bin/perl use Benchmark; timethese(1000, { 'perl' => \&perl_grep, 'system' => \&system_grep, }); sub perl_grep{ open(OUTPUT, "$script | grep EXPRESSION |"); my @output = <OUTPUT>; } sub system_grep{ open(OUTPUT, "$script |"); my @output = <OUTPUT>; @output = grep { EPRESSION } @output; }

    This will run each sub 1000 times and give you stats on their speed

    cheers,

    J

Re: What is faster?
by Skeeve (Parson) on Jun 02, 2003 at 07:46 UTC
    As juerd already said: It depends.

    but then there is also the possibility to bendchmark your script in order to find out what's faster on YOUR system. You will have to benchmark again if you change the system.

    In general: I would go the second way and not rely on any external grep-command to exist. The problems with them are, as far as I can see:

    1. Is it in the PATH?
    2. If not, what's it's path?
    3. If it is, is it safe to assume that it is always the same grep you call? (some other earlier in the path at some later time)
    4. How does that grep evaluate it's parameters?
    5. How to escape meta characters?
    6. What are the meta characters?
    7. ...
Re: What is faster?
by Juerd (Abbot) on Jun 02, 2003 at 07:37 UTC

    There is only one answer to performance related questions: it depends.

    If there is any noticable difference, you would have noticed already, but since you have not, you can safely assume that it doesn't matter much.

    Juerd # { site => 'juerd.nl', plp_site => 'plp.juerd.nl', do_not_use => 'spamtrap' }

Re: What is faster?
by naChoZ (Curate) on Jun 02, 2003 at 18:36 UTC
Re: What is faster? (does it matter)
by Aristotle (Chancellor) on Jun 02, 2003 at 21:45 UTC

    The first snippet makes your script faster at creating a dependancy on an external utility. Unix folk will have no problem, others probably will.

    You mean performance? Well that depends.. but if you put a little more effort into it than your second snippet (see the while loop version posted elsewhere), you should be able to reduce the performance difference enough to only matter in pathological cases. Which means that unless you are working on a pathological case, you shouldn't think about performance. Other factors are more important.

    Particularly if you have access to the $script's code, it would very probably be far better to add a commandline option or something of the sort so as only to produce the desired output.

    Don't microoptimize single operations - particularly if you don't even have a real performance problem yet.

    Makeshifts last the longest.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://262324]
Approved by Enlil
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others pondering the Monastery: (2)
As of 2024-04-25 23:04 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found