Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??

I've been looking through a fairly large code base, updating things and looking for possible preformance improvements. I came across a few bits of code that use:

my(@list) = glob('/path/d*/d*');

To get lists of files to process. My tendancy in the past has been to either use File::Find for more complicated searches, or just opendir for more basic situations. Being a good scientist, I decided to put together some benchmarks to determine which would be faster. My results consistently showed the opendir method running substantially faster every time (the sub routines I'm testing with are included below). I'm glad to have done the tests, but surprised by the results, so I decided to poke my head in a little deeper to figure out why. So I ran my tests a few times with strace running to get a look at what was actually going on. What I discovered was that using glob, perl was doing an lstat on every single item it was returning from the glob, which the opendir method clearly did not.

I've looked here, and here, but nothing there has explained why glob is running lstat on every item it returns. Even looking through the various flags available when using the "bsd_glob" method available for export from File::Glob do not appear to make use of the data that lstat would be providing... so why is perl wasting so many compute cycles getting that information?

sub get_by_glob { my @dids = map {/d([^\/]+)$/; $1} csh_glob("$path/d[0-9]*/d[0-9]*" +); return \@dids; } sub get_by_open { opendir(my $dh, $path); my(@top) = grep(/^d/, readdir($dh)); closedir($dh); my(@dids)=(); foreach my $sd (@top){ opendir(my $sh, $path.'/'.$sd.'/'); push @dids, map({m/d(\d+)$/} readdir($sh)); closedir($sh); } return \@dids; }

Any insight would be appreciated.


They say that time changes things, but you actually have to change them yourself.

—Andy Warhol


In reply to Glob and lstat by JediWizard

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others scrutinizing the Monastery: (1)
As of 2024-04-25 04:35 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found