http://qs321.pair.com?node_id=1148448

glasswalk3r has asked for the wisdom of the Perl Monks concerning the following question:

Hello monks,

I'm looking for a fast way to list the contents of a directory (with thousands of files) on Linux by using Perl.

I did some research on that and found a sample C code that uses the getdents system call for that. By using it, one can avoid calling stat on each file inside the directory (basically what ls command does).

I did some tests with readdir, but performance speed compared to the already mentioned C code is not as good. That said, I'm inclined to try to use Perl syscall to do the same. Below is the C code (for those inclined to read it):

#define _GNU_SOURCE #include <dirent.h> /* Defines DT_* constants */ #include <fcntl.h> #include <stdio.h> #include <unistd.h> #include <stdlib.h> #include <sys/stat.h> #include <sys/syscall.h> #define handle_error(msg) \ do { perror(msg); exit(EXIT_FAILURE); } while (0) struct linux_dirent { long d_ino; off_t d_off; unsigned short d_reclen; char d_name[]; }; #define BUF_SIZE 1024*1024*5 int main(int argc, char *argv[]) { int fd, nread; char buf[BUF_SIZE]; struct linux_dirent *d; int bpos; char d_type; fd = open(argc > 1 ? argv[1] : ".", O_RDONLY | O_DIRECTORY); if (fd == -1) handle_error("open"); for ( ; ; ) { nread = syscall(SYS_getdents, fd, buf, BUF_SIZE); if (nread == -1) handle_error("getdents"); if (nread == 0) break; for (bpos = 0; bpos < nread;) { d = (struct linux_dirent *) (buf + bpos); if (d->d_ino != 0) printf("%s\n", (char *) d->d_name); bpos += d->d_reclen; } } exit(EXIT_SUCCESS); }

This is how the C struct should look like:

struct linux_dirent { unsigned long d_ino; /* Inode number 32*/ unsigned long d_off; /* Offset to next linux_dirent 32*/ unsigned short d_reclen; /* Length of this linux_dirent 16*/ char d_name[]; /* Filename (null-terminated) */ /* length is actually (d_reclen - 2 - offsetof(struct linux_dirent, d_name)) */ }

Since I'm not a C programmer, I struggling to achieve that. I found that I need to use unpack to retrieve the information from the related C struct, but I'm lost about:

Is it even possible to do that without having to use XS (or any of it's alternatives)? I found Convert::Binary::C to give a hand, but probably I'm not using it correctly due the 2 issues above. If I use Data::Dumper on the buffer, I can see the file names, but got only garbage from Convert::Binary::C.

Here is my (not working) Perl code implementation:

#!/usr/bin/env perl use warnings; use strict; use Cwd; use File::Spec; use Data::Dumper; use Fcntl; use Convert::Binary::C; use constant BUF_SIZE => 4096; use lib '/home/myself/perl5/perls/perl-5.20.1/lib/site_perl/5.20.3/i686-linux/ +sys'; require 'syscall.ph'; my $dir = File::Spec->catdir( getcwd(), 'test' ); sysopen( my $fd, $dir, O_RDONLY | O_DIRECTORY ); my $buf = "\0" x 128; $! = 0; my $converter = Convert::Binary::C->new(); my $struct = <<CODE; struct foo { long d_ino; unsigned long d_off; unsigned short d_reclen; char d_name[]; }; CODE $converter->parse($struct); my $read = syscall( &SYS_getdents, fileno($fd), $buf, BUF_SIZE ); if ( ( $read == -1 ) and ( $! != 0 ) ) { die "failed to syscal getdents: $!"; } #print Dumper($read), "\n"; #print Dumper($buf), "\n"; close($fd); my $data = $converter->unpack( 'foo', $buf ); print Dumper($data);

Thanks!

UPDATED

For the sake of others that may want to research about, I made available the module Linux::NFS::BigDir for that at CPAN, and here is the complete working code I got after all inputs from andal:

#!/usr/bin/env perl use warnings; use strict; use File::Spec; use Getopt::Std; use Fcntl; use constant BUF_SIZE => 4096; use lib '/home/myself/perl5/perls/perl-5.20.1/lib/site_perl/5.20.3/i686-linux/ +sys'; require 'syscall.ph'; my %opts; getopts( 'd:', \%opts ); die 'option -d <DIRECTORY> is required' unless ( ( exists( $opts{d} ) ) and ( defined( $opts{d} ) ) ); sysopen( my $fd, $opts{d}, O_RDONLY | O_DIRECTORY ); while (1) { my $buf = "\0" x BUF_SIZE; my $read = syscall( &SYS_getdents, fileno($fd), $buf, BUF_SIZE ); if ( ( $read == -1 ) and ( $! != 0 ) ) { die "failed to syscal getdents: $!"; } last if ( $read == 0 ); while ( $read != 0 ) { my ( $ino, $off, $len, $name ) = unpack( "LLSZ*", $buf ); #print $name, "\n" if ( $ino != 0 ); unless ( ( $name eq '.' ) or ( $name eq '..' ) ) { my $path = File::Spec->catfile( $opts{d}, $name ); unlink $path or die "Cannot remove $path: $!"; } substr( $buf, 0, $len ) = ''; $read -= $len; } } close($fd);
Alceu Rodrigues de Freitas Junior
---------------------------------
"You have enemies? Good. That means you've stood up for something, sometime in your life." - Sir Winston Churchill