Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

Re: Threads slurping a directory and processing before conclusion

by Clarendon4 (Acolyte)
on Aug 22, 2011 at 10:26 UTC ( [id://921618]=note: print w/replies, xml ) Need Help??


in reply to Threads slurping a directory and processing before conclusion

 > 3. previous attempts have hit major stability and
 > time snags, even at the prototyping stage due to the
 > sheer volume of files that make up a comprehensive sample

I notice (based on the "F:/" pathname) that you're on Win32.

You have a File::Find::find like recursive file processing part in your code. This is always going to be slower than necesary on Win32 when coded in Perl.

Consider using/writing some C/XS that generates the file list and avoids all the unnecesary stat (-d !) calls by using FindNextFile().

Also consider using forks over threads. They're easier on Win32 than you might think.

Take a look at qfind.c and peg in my CPAN directory for ideas:

http://cpan.mirrors.uk2.net/authors/id/A/AD/ADAVIES/

Try comparing the time taken for qfind to generate a file list compared to a pure Perl solution eg.


c:\> perl -e "${^WIN32_SLOPPY_STAT}=1; use Time::HiRes; $start = Time::HiRes::time; open Q, 'qfind.exe |'; while (<Q>) {}; close Q; print 'Took ', (Time::HiRes::time - $start)"

c:\> perl -e "${^WIN32_SLOPPY_STAT}=1; use Time::HiRes; use File::Find; $start = Time::HiRes::time; File::Find::find(sub { }, '.'); print 'Took ', (Time::HiRes::time - $start)"

On my Perl source directory of ~10_000 files this is <0.3 sec vs 1.7 sec. I suspect on your 1.2 million files this gives a *considerable* speed up.

Oh, and make sure you  BEGIN { ${^WIN32_SLOPPY_STAT} = 1 }; at the top of your code!

Good luck.

Replies are listed 'Best First'.
Re^2: Threads slurping a directory and processing before conclusion
by BrowserUk (Patriarch) on Aug 22, 2011 at 16:58 UTC

    qfind is interesting and quite fast, but given that the OP is talking about slurping the contents of millions of image files, the time taken to produce the list of those files is likely to be completely insignificant.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://921618]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others scrutinizing the Monastery: (3)
As of 2024-04-26 07:27 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found