Re^4: greater efficiency required (ls, glob, or readdir?)

... make a big difference on distribution of iowait load (which, in sum, will be the same of course).

Not necessarely.

Directory entries are stored in blocks, and they will be read from disk (assuming a disk based file system, if it's memory based, I/O wait will be far, far less) block by block. While using 'ls' means all the blocks for the directory (one for a small directory, more for a large directory) need to be fetched and read, it also means each block will be processed quickly, and once processed, it's not needed again. It's quite likely that the block will remain in the buffer cache the entire time it takes to process it. OTOH, when doing readdir, and processing the file after reading each entry from the directory, there's a probability (which increases the larger the fetched file is) the block will disappear from the cache before the readdir loop is done with it, requiring a second fetch of the same block.

Whether this is actually measurable is a different matter.

Comment on Re^4: greater efficiency required (ls, glob, or readdir?)

Replies are listed 'Best First'.

Re^5: greater efficiency required (ls, glob, or readdir?)
by shmem (Chancellor) on Aug 27, 2008 at 20:55 UTC

Recently, I had to remove a directory containing 2.75 million files (some php debug blunder) in a vserver sub-directory of a machine which already ran under heavy I/O load. None of

rm -r $dir
find $dir -exec rm {} \;
ls $dir | xargs rm
[download]

was an option, since each would hog I/O, and the delay for productive tasks was unacceptable. Buffer cache was not an issue, plenty of memory being always available, and each large chain of multiple indirect blocks could be held in memory, processing each return from getdents(2) as it was delivered. Not so with ls, find et al, since those were hogging memory too and invalidating parts of the file system buffer cache while reading all entries.

ls

[reply]
[d/l]
[select]


more useful options
	PerlMonks