Re: Re: What is faster?

Here, here.. ;)

Seriously though, I have found the internal grep to suit my needs very nicely when parsing files. I haven't really done much in terms of post processing of other commands. I was honestly amazed at the difference in speed between say

open(IN, "/some/file") || die "Cant open /some/file: $!\n";
while (<IN>) {
   next unless (m/^$some_match/);
   chomp($capture = $_);
}
close(IN);
[download]

as opposed to

open(IN, "/some/file") || die "Cant open /some/file: $!\n";
chomp( ($capture) = grep(/^$some_match/, <IN>) );
close(IN);
[download]

Especially as the size of the file being processed increases. Im not sure of the why of it, as I haven't gone poking around Perl's internals, but it certainly increased my regular useage of grep.

MMMMM... Chocolaty Perl Goodness.....

Comment on Re: Re: What is faster? Select or Download Code

Replies are listed 'Best First'.
Re3: What is faster? by bbfu (Curate) on Jun 03, 2003 at 01:44 UTC
The obvious guess as to why grep is faster is that it's implemented in C. Any Perl built-in is going to be faster than the equivalent Perl code because it's compiled down to native machine code (and probably better optimized, to boot). It's the same reason why Perl's built-in lexical sort is faster than giving an explicit comparison routine (and why you're often better off munging the data before-hand so that you can use the built-in lexical sort; update: sometimes known as the Guttman Rosler Transform). That said, given your example use of `while` vs. `grep` and assuming you're interested in the first or only match (as in the while example), I would like to recommend the oft-overlooked List::Util function `first`. It's implemented in C, so it should be as fast as `grep` (or nearly so), and has the advantage of stopping when a match is found. This could potentially save a lot of file IO in your example. It also has the advantage of not building a return list which is just thrown away after getting the first item. List::Util is part of the standard distribution for 5.8.0 and should be an easy install on previous versions, as well. And there's several other useful functions in there as well, not to mention the compainion module Scalar::Util. bbfu Black flowers blossom Fearless on my breath	[reply] [d/l] [select]
Re: Re: Re: What is faster? (grep vs while) by Not_a_Number (Prior) on Jun 02, 2003 at 18:16 UTC
l2kashe: Your two snippets don't seem to be equivalent. The first (`while` loop), when run on my dictionary file with the pattern /^zy/, sets `$capture` to 'zymurgy' (the last match), while the second (`grep`) sets it to 'zygote' (the first match). I understand why the `while` loop works how it does, perhaps someone could explain why `grep` works differently? TIA, dave	[reply] [d/l] [select]
Re: Re: Re: Re: What is faster? (grep vs while) by l2kashe (Deacon) on Jun 02, 2003 at 19:00 UTC
Because the 2 code snippets arent quite the same. A better comparison would be `open(IN, "/some/file") \|\| die "Cant open /some/file: $!\n"; while (<IN>) { push(@foo, $_) if ( m/^$match/ ); } close(IN); # as opposed to open(IN, "/some/file") \|\| die "Cant access /some/file: $!\n"; @foo = grep(m/^$match/, <IN>); close(IN);` [download] The differences in the first post between the loops are as follows: In the while loop, it continues to iterate over the input list, each time storing the value if it matches the regex. So if you have 3 values which match, then the last value matched will be placed in the variable. In the grep loop, due to my excessive use of parentheses I forced grep to return a list, but only captured the first element. Along the same lines as say `$f = 'foo:bar:baz'; ($blah) = split(/:/, $f);` [download] $blah now contains 'foo', bar and baz are silently discarded. So grep finds all the matches, and returns them as a list, but I only grab the first value and toss the rest away. Use the first snippet in this post as a better example of the differences. Also remember that I stated on files themselves, not on directories, as grep really begins to out perform the loop as the data set gets larger. The loop gives you a far greater granularity of control in what to do with the contents of what you are iterating over. It will be more efficient to loop over a data set once in a loop if you are planning on doing different things with different pieces of the data. If you already know you want a specific piece of data, and only need to process the data set once, then grep is your friend. MMMMM... Chocolaty Perl Goodness.....	[reply] [d/l] [select]


Keep It Simple, Stupid
	PerlMonks