What is faster?

hotshot has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: What is faster? by BrowserUk (Patriarch) on Jun 02, 2003 at 08:43 UTC
One situation where I might use an external grep utility in preference to the internal one is if the script produces large volumes of ouput, and the selection process filters out a large proportion of it. The difference in pure speed terms is likely to be minimal, but the reduction in memory usage by not loading data just to discard it might be worth having. That said, the amount of memory used by the internal version could be minimised by applying the grep at input rather than afterwards. Ie. Update: DO NOT USE THE CODE BELOW!! Good idea, bad implementation as pointed out below by tilly `open(OUTPUT, "$script \|"); my @output = grep { EXPRESSION } <OUTPUT>;` [download] You'd probably need to be discarding a significant amount of input for this to make any great difference, but it probably wouldn't harm in any case, so why not do it anyway. Examine what is said, not who speaks. "Efficiency is intelligent laziness." -David Dunham "When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller	[reply] [d/l]
Re: Re: What is faster? by l2kashe (Deacon) on Jun 02, 2003 at 14:20 UTC
Here, here.. ;) Seriously though, I have found the internal grep to suit my needs very nicely when parsing files. I haven't really done much in terms of post processing of other commands. I was honestly amazed at the difference in speed between say `open(IN, "/some/file") \|\| die "Cant open /some/file: $!\n"; while (<IN>) { next unless (m/^$some_match/); chomp($capture = $_); } close(IN);` [download] as opposed to `open(IN, "/some/file") \|\| die "Cant open /some/file: $!\n"; chomp( ($capture) = grep(/^$some_match/, <IN>) ); close(IN);` [download] Especially as the size of the file being processed increases. Im not sure of the why of it, as I haven't gone poking around Perl's internals, but it certainly increased my regular useage of grep. MMMMM... Chocolaty Perl Goodness.....	[reply] [d/l] [select]
Re3: What is faster? by bbfu (Curate) on Jun 03, 2003 at 01:44 UTC
The obvious guess as to why grep is faster is that it's implemented in C. Any Perl built-in is going to be faster than the equivalent Perl code because it's compiled down to native machine code (and probably better optimized, to boot). It's the same reason why Perl's built-in lexical sort is faster than giving an explicit comparison routine (and why you're often better off munging the data before-hand so that you can use the built-in lexical sort; update: sometimes known as the Guttman Rosler Transform). That said, given your example use of `while` vs. `grep` and assuming you're interested in the first or only match (as in the while example), I would like to recommend the oft-overlooked List::Util function `first`. It's implemented in C, so it should be as fast as `grep` (or nearly so), and has the advantage of stopping when a match is found. This could potentially save a lot of file IO in your example. It also has the advantage of not building a return list which is just thrown away after getting the first item. List::Util is part of the standard distribution for 5.8.0 and should be an easy install on previous versions, as well. And there's several other useful functions in there as well, not to mention the compainion module Scalar::Util. bbfu Black flowers blossom Fearless on my breath	[reply] [d/l] [select]
Re: Re: Re: What is faster? (grep vs while) by Not_a_Number (Prior) on Jun 02, 2003 at 18:16 UTC
l2kashe: Your two snippets don't seem to be equivalent. The first (`while` loop), when run on my dictionary file with the pattern /^zy/, sets `$capture` to 'zymurgy' (the last match), while the second (`grep`) sets it to 'zygote' (the first match). I understand why the `while` loop works how it does, perhaps someone could explain why `grep` works differently? TIA, dave	[reply] [d/l] [select]
Re: Re: Re: Re: What is faster? (grep vs while) by l2kashe (Deacon) on Jun 02, 2003 at 19:00 UTC
Re: Re: What is faster? by tilly (Archbishop) on Jun 03, 2003 at 04:55 UTC
The theory is right, the application less so. The `<OUTPUT>` construct immediately sucks the whole file into memory. To get the full space savings that you describe (not using variables will get you some...), you need to do something like this: `open(OUTPUT, "$script \|") or die "Cannot run '$script': $!"; my @output; while (<OUTPUT>) { push @output, $_ if /EXPRESSION/; }` [download] With the emphasis on creative laziness that they keep on talking about for Perl 6, it is possible that Perl 6 will automatically save memory for you with your construct. It is definite that Ruby does. But with Perl 5, you need to write things out longhand. UPDATE Zaxo pointed out that I didn't close the diamond properly. Fixed.	[reply] [d/l] [select]
Re: Re: Re: What is faster? by BrowserUk (Patriarch) on Jun 03, 2003 at 06:00 UTC
Good point, crap implementation. Thanks for pointing it out. Examine what is said, not who speaks. "Efficiency is intelligent laziness." -David Dunham "When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller	[reply]
Re: What is faster? by edoc (Chaplain) on Jun 02, 2003 at 07:49 UTC
You want to write a benchmarking script to test it.. *Untested, off the top of my head, code follows.. `#!/usr/bin/perl use Benchmark; timethese(1000, { 'perl' => \&perl_grep, 'system' => \&system_grep, }); sub perl_grep{ open(OUTPUT, "$script \| grep EXPRESSION \|"); my @output = <OUTPUT>; } sub system_grep{ open(OUTPUT, "$script \|"); my @output = <OUTPUT>; @output = grep { EPRESSION } @output; }` [download] This will run each sub 1000 times and give you stats on their speed cheers, J	[reply] [d/l]
Re: What is faster? by Skeeve (Parson) on Jun 02, 2003 at 07:46 UTC
As juerd already said: It depends. but then there is also the possibility to bendchmark your script in order to find out what's faster on YOUR system. You will have to benchmark again if you change the system. In general: I would go the second way and not rely on any external grep-command to exist. The problems with them are, as far as I can see: Is it in the PATH? If not, what's it's path? If it is, is it safe to assume that it is always the same grep you call? (some other earlier in the path at some later time) How does that grep evaluate it's parameters? How to escape meta characters? What are the meta characters? ...	[reply]
Re: What is faster? by Juerd (Abbot) on Jun 02, 2003 at 07:37 UTC
There is only one answer to performance related questions: it depends. If there is any noticable difference, you would have noticed already, but since you have not, you can safely assume that it doesn't matter much. Juerd # { site => 'juerd.nl', plp_site => 'plp.juerd.nl', do_not_use => 'spamtrap' }	[reply]
Re: What is faster? by naChoZ (Curate) on Jun 02, 2003 at 18:36 UTC
brian d foy wrote a piece for The Perl Journal about Benchmarking Perl that's worth a look. ~~ naChoZ	[reply]
Re: What is faster? (does it matter) by Aristotle (Chancellor) on Jun 02, 2003 at 21:45 UTC
The first snippet makes your script faster at creating a dependancy on an external utility. Unix folk will have no problem, others probably will. You mean performance? Well that depends.. but if you put a little more effort into it than your second snippet (see the `while` loop version posted elsewhere), you should be able to reduce the performance difference enough to only matter in pathological cases. Which means that unless you are working on a pathological case, you shouldn't think about performance. Other factors are more important. Particularly if you have access to the `$script`'s code, it would very probably be far better to add a commandline option or something of the sort so as only to produce the desired output. Don't microoptimize single operations - particularly if you don't even have a real performance problem yet. Makeshifts last the longest.	[reply]