http://qs321.pair.com?node_id=120339


in reply to -s takes too long on 15,000 files

Welcome to a real world demonstration of why algorithm efficiency matters. :-)

The likely reason why stat is so slow is that each stat has to traverse the entire directory linearly to find the file that you are interested in. That means that you are going back to the directory structure 10,000 times, each time scanning on average about 5,000 entries to find the file of interest. The resulting 50,000,000 fetches of information is what is taking the time.

You can verify this by printing every 100'th filename. You should see a progressive slowdown as the stats slow down due to having to scan more and more previous files.

You would speed this up by a factor of 10 if you arranged to have 10 directories of 1000 files each. You would speed up by a much larger factor with a filesystem which was designed to handle directories with many small files. (Think ReiserFS on Linux.)

The ideal answer, of course, would be to have your direct pass through the directory structure pull back not just the name, but also the associated metadata. That is what ls and dir do, and it is why they are so much faster. Unfortunately Perl's API doesn't give you direct access to that information.

An incidental note. Using Perl does not guarantee portable code. For instance your use of lc will cause you porting problems on operating systems with case-sensitive filesystems (like Unix). (You will be statting a different file than you saw, in fact likely one that isn't there.)