Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Re: Re: What is faster?

by l2kashe (Deacon)
on Jun 02, 2003 at 14:20 UTC ( [id://262380]=note: print w/replies, xml ) Need Help??


in reply to Re: What is faster?
in thread What is faster?

Here, here.. ;)

Seriously though, I have found the internal grep to suit my needs very nicely when parsing files. I haven't really done much in terms of post processing of other commands. I was honestly amazed at the difference in speed between say
open(IN, "/some/file") || die "Cant open /some/file: $!\n"; while (<IN>) { next unless (m/^$some_match/); chomp($capture = $_); } close(IN);
as opposed to
open(IN, "/some/file") || die "Cant open /some/file: $!\n"; chomp( ($capture) = grep(/^$some_match/, <IN>) ); close(IN);
Especially as the size of the file being processed increases. Im not sure of the why of it, as I haven't gone poking around Perl's internals, but it certainly increased my regular useage of grep.

MMMMM... Chocolaty Perl Goodness.....

Replies are listed 'Best First'.
Re3: What is faster?
by bbfu (Curate) on Jun 03, 2003 at 01:44 UTC

    The obvious guess as to why grep is faster is that it's implemented in C. Any Perl built-in is going to be faster than the equivalent Perl code because it's compiled down to native machine code (and probably better optimized, to boot). It's the same reason why Perl's built-in lexical sort is faster than giving an explicit comparison routine (and why you're often better off munging the data before-hand so that you can use the built-in lexical sort; update: sometimes known as the Guttman Rosler Transform).

    That said, given your example use of while vs. grep and assuming you're interested in the first or only match (as in the while example), I would like to recommend the oft-overlooked List::Util function first. It's implemented in C, so it should be as fast as grep (or nearly so), and has the advantage of stopping when a match is found. This could potentially save a lot of file IO in your example. It also has the advantage of not building a return list which is just thrown away after getting the first item. List::Util is part of the standard distribution for 5.8.0 and should be an easy install on previous versions, as well. And there's several other useful functions in there as well, not to mention the compainion module Scalar::Util.

    bbfu
    Black flowers blossom
    Fearless on my breath

Re: Re: Re: What is faster? (grep vs while)
by Not_a_Number (Prior) on Jun 02, 2003 at 18:16 UTC
    l2kashe:

    Your two snippets don't seem to be equivalent. The first (while loop), when run on my dictionary file with the pattern /^zy/, sets $capture to 'zymurgy' (the last match), while the second (grep) sets it to 'zygote' (the first match).

    I understand why the while loop works how it does, perhaps someone could explain why grep works differently?

    TIA, dave

      Because the 2 code snippets arent quite the same. A better comparison would be
      open(IN, "/some/file") || die "Cant open /some/file: $!\n"; while (<IN>) { push(@foo, $_) if ( m/^$match/ ); } close(IN); # as opposed to open(IN, "/some/file") || die "Cant access /some/file: $!\n"; @foo = grep(m/^$match/, <IN>); close(IN);
      The differences in the first post between the loops are as follows:

      In the while loop, it continues to iterate over the input list, each time storing the value if it matches the regex. So if you have 3 values which match, then the last value matched will be placed in the variable.

      In the grep loop, due to my excessive use of parentheses I forced grep to return a list, but only captured the first element. Along the same lines as say
      $f = 'foo:bar:baz'; ($blah) = split(/:/, $f);
      $blah now contains 'foo', bar and baz are silently discarded. So grep finds all the matches, and returns them as a list, but I only grab the first value and toss the rest away. Use the first snippet in this post as a better example of the differences. Also remember that I stated on files themselves, not on directories, as grep really begins to out perform the loop as the data set gets larger. The loop gives you a far greater granularity of control in what to do with the contents of what you are iterating over. It will be more efficient to loop over a data set once in a loop if you are planning on doing different things with different pieces of the data. If you already know you want a specific piece of data, and only need to process the data set once, then grep is your friend.

      MMMMM... Chocolaty Perl Goodness.....

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://262380]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others avoiding work at the Monastery: (2)
As of 2024-04-19 21:31 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found