comment on

The problem is that finding over a large directory could take hours.

How large are we talking? Does it take hours to run ls -RU over that directory? If so, then there's nothing you can do in Perl to do it faster because that's how long it takes for the disk to retrieve the directory entries. A quick test on my laptop suggests that 1 hour may correspond to about a million directory entries on this machine, but your hardware may vary. Wildly.

Also, if you're on a *nix box, I'd be willing to bet that the OS's find binary is pretty well optimized. Generating a list of candidate directories with find $STARTING_DIR -name secret.file, then using Perl to run down that list and remove any with a .ignore file would probably be a pretty effective way to do this, albeit less effective as an exercise in using/learning more Perl, if that's your primary objective. There may even be a way to get find to filter out the directories with .ignore files in the first pass, so that you don't have to go back a second time to look for them, but my find-fu isn't up to that task.

Even if you're going to ultimately write a Perl solution regardless, generating a list of all the secret.files with find is going to be a good sanity check to estimate the absolute fastest possible time the task could be done in.

My first idea was to split the directories into the process so they will perform a parallel search but I'm not sure if that a good idea.

If your bottleneck is on disk I/O rather than on processing, then parallelization won't help (if it's already waiting on the disk, having more CPU cores waiting isn't going to make the disk any faster) and may make things significantly worse (by making the disk spend more time jumping from one directory to another, and less time actually reading the data you want).

In reply to Re: Finding files recursively by dsheroh
in thread Finding files recursively by ovedpo15

Are you posting in the right place? Check out Where do I post X? to know for sure.
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
Want more info? How to link or How to display code and escape characters are good places to start.


Come for the quick hacks, stay for the epiphanies.
	PerlMonks