Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

Re^2: greater efficiency required (ls, glob, or readdir?)

by JavaFan (Canon)
on Aug 27, 2008 at 18:22 UTC ( #707247=note: print w/replies, xml ) Need Help??


in reply to Re: greater efficiency required (ls, glob, or readdir?)
in thread greater efficiency required (ls, glob, or readdir?)

Actually, using ls has some advantages. For instance, the opendir/readdir solution presented by jwkrahn below will try to open the current and parent directory as if they were files. A plain 'ls' will not return any names starting with a dot. The equivalent of
my @files = `ls $dir`;
is
my @files; { opendir my $dh, $dir or die; @files = grep {!/^\./} readdir $dh; closedir $dh; }
As for using cat to read a file, it's something I do often. It's simple. It's a short one liner. Doing it in pure Perl requires several lines, or something cryptic.
my $info = do {local (@ARGV, $/) = $file; <>}; # Cryptic # 7 Lines. my $info; { open my $fh, "<", $file or die; local $/; $info = <$fh>; close $fh; }

Replies are listed 'Best First'.
Re^3: greater efficiency required (ls, glob, or readdir?)
by ikegami (Pope) on Aug 27, 2008 at 18:33 UTC

    using ls has some advantages.

    That's why some use glob.

    Doing it in pure Perl requires several lines

    It would take more than 7 lines to do the equivalent of or die $! when using cat. It's so complex you probably don't even bother doing it.

    The OP's code is the perfect example. By using cat,

    • he used three lines instead of two,
    • he removed the error checking he'd do with open,
    • he introduced a lot of overhead in a loop,
    • he introduced a bug that deletes trailing blank lines, and
    • he introduced a bug for files with spaces and other special characters in their names.

    Update: Added OP as an example.

      It would take more than 7 lines to do the equivalent of or die $! when using cat.
      die if $?;
      only takes one short line.
        Do you deem "Died at x.pl line y." acceptable for something that could be an I/O or spawning error? (It doesn't if it is and which it is).
Re^3: greater efficiency required (ls, glob, or readdir?)
by shmem (Chancellor) on Aug 27, 2008 at 19:53 UTC
    The equivalent of
    ...
    is
    ...

    No, it is not. ls(1) will call getdents(2) until that doesn't return any more directory entries and finally output the list, while perl's readdir returns control after each call to getdents(2). Depending on the size of the directory to be read and the tasks to be done on each entry, it can make a big difference on distribution of iowait load (which, in sum, will be the same of course). I'm talking about linux here..

      ... make a big difference on distribution of iowait load (which, in sum, will be the same of course).

      Not necessarely.

      Directory entries are stored in blocks, and they will be read from disk (assuming a disk based file system, if it's memory based, I/O wait will be far, far less) block by block. While using 'ls' means all the blocks for the directory (one for a small directory, more for a large directory) need to be fetched and read, it also means each block will be processed quickly, and once processed, it's not needed again. It's quite likely that the block will remain in the buffer cache the entire time it takes to process it. OTOH, when doing readdir, and processing the file after reading each entry from the directory, there's a probability (which increases the larger the fetched file is) the block will disappear from the cache before the readdir loop is done with it, requiring a second fetch of the same block.

      Whether this is actually measurable is a different matter.

        Recently, I had to remove a directory containing 2.75 million files (some php debug blunder) in a vserver sub-directory of a machine which already ran under heavy I/O load. None of

        rm -r $dir find $dir -exec rm {} \; ls $dir | xargs rm

        was an option, since each would hog I/O, and the delay for productive tasks was unacceptable. Buffer cache was not an issue, plenty of memory being always available, and each large chain of multiple indirect blocks could be held in memory, processing each return from getdents(2) as it was delivered. Not so with ls, find et al, since those were hogging memory too and invalidating parts of the file system buffer cache while reading all entries.

        Using perl, a readdir() loop, select() and unlink() solved that. My point is that readdir() gives you finer control than shelling out ls.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://707247]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others exploiting the Monastery: (6)
As of 2020-11-24 12:28 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?