http://qs321.pair.com?node_id=707241


in reply to greater efficiency required (ls, glob, or readdir?)

In theory, all solutions will boil down to a readdir (the system call), so it's likely to be the fastest.

In practice, the speed difference between the different methods is probably minor, so you should use the method you find easiest to use and maintain. If you find that too slow, *then* come to us.

My personal opinion is that running a child process to get a directory listing is a rather silly thing to do. I wouldn't use ls. Doubly so for using cat for reading a file!!!

Replies are listed 'Best First'.
Re^2: greater efficiency required (ls, glob, or readdir?)
by betterworld (Curate) on Aug 27, 2008 at 18:24 UTC
    My personal opinion is that running a child process to get a directory listing is a rather silly thing to do. I wouldn't use ls. Doubly so for using cat for reading a file!!!

    I second that; especially because $dir and $_ will be interpreted by the shell. So you will get problems if a directory name or entry has special characters in it.

    Even if you don't think that this is important in your case, it's better to make the code more maintainable and re-usable for security-aware scenarios.

    While you can avoid these problems by using open my $pipe, '-|', 'ls', $dir, it's really not worth the trouble; readdir (or IO::Dir) has less problems. And for reading the file, use open or File::Slurp.

      Actually, $dir and $_ will only be interpreted by the shell if they contain "funny" characters. If a string passed as argument to qx or one-arg system contains just alphanums, underscores and whitespace, no shell gets involved, perl will call execv directly.

      But obviously, if you don't know what $dir contains, you shouldn't use `ls $dir`, and if you aren't in control of the content of the directory you should use `cat $_`.

      Even if you don't think that this is important in your case, it's better to make the code more maintainable and re-usable for security-aware scenarios.

      I cannot agree with that, if only for the reasons it's often not mutually possible. More maintable code usually means simpler code, while code that needs to be run in a possible hostile environment tends to be more complex than code that doesn't have to run in such an environment. "More maintable" and "re-usable for security aware scenarios" are most of the time conflicting requirements.

Re^2: greater efficiency required (ls, glob, or readdir?)
by JavaFan (Canon) on Aug 27, 2008 at 18:22 UTC
    Actually, using ls has some advantages. For instance, the opendir/readdir solution presented by jwkrahn below will try to open the current and parent directory as if they were files. A plain 'ls' will not return any names starting with a dot. The equivalent of
    my @files = `ls $dir`;
    is
    my @files; { opendir my $dh, $dir or die; @files = grep {!/^\./} readdir $dh; closedir $dh; }
    As for using cat to read a file, it's something I do often. It's simple. It's a short one liner. Doing it in pure Perl requires several lines, or something cryptic.
    my $info = do {local (@ARGV, $/) = $file; <>}; # Cryptic # 7 Lines. my $info; { open my $fh, "<", $file or die; local $/; $info = <$fh>; close $fh; }

      using ls has some advantages.

      That's why some use glob.

      Doing it in pure Perl requires several lines

      It would take more than 7 lines to do the equivalent of or die $! when using cat. It's so complex you probably don't even bother doing it.

      The OP's code is the perfect example. By using cat,

      • he used three lines instead of two,
      • he removed the error checking he'd do with open,
      • he introduced a lot of overhead in a loop,
      • he introduced a bug that deletes trailing blank lines, and
      • he introduced a bug for files with spaces and other special characters in their names.

      Update: Added OP as an example.

        It would take more than 7 lines to do the equivalent of or die $! when using cat.
        die if $?;
        only takes one short line.
      The equivalent of
      ...
      is
      ...

      No, it is not. ls(1) will call getdents(2) until that doesn't return any more directory entries and finally output the list, while perl's readdir returns control after each call to getdents(2). Depending on the size of the directory to be read and the tasks to be done on each entry, it can make a big difference on distribution of iowait load (which, in sum, will be the same of course). I'm talking about linux here..

        ... make a big difference on distribution of iowait load (which, in sum, will be the same of course).

        Not necessarely.

        Directory entries are stored in blocks, and they will be read from disk (assuming a disk based file system, if it's memory based, I/O wait will be far, far less) block by block. While using 'ls' means all the blocks for the directory (one for a small directory, more for a large directory) need to be fetched and read, it also means each block will be processed quickly, and once processed, it's not needed again. It's quite likely that the block will remain in the buffer cache the entire time it takes to process it. OTOH, when doing readdir, and processing the file after reading each entry from the directory, there's a probability (which increases the larger the fetched file is) the block will disappear from the cache before the readdir loop is done with it, requiring a second fetch of the same block.

        Whether this is actually measurable is a different matter.