note
kyle
<p>For fun, I did a rewrite. I haven't tested it, but it compiles cleanly.
<readmore>
<c>
my $dir = "/Path/to/a/data/directory";
my %hash;
$hash{$_} = hashify_file( $_ ) for ls_dir( $dir );
sub ls_dir {
my $dir = shift;
opendir my $dh, $dir
or die "Can't opendir '$dir': $!\n";
return map { "$dir/$_" }
grep { !/^\./ && !/\~$/ } readdir $dh;
}
sub hashify_file {
my $file = shift;
open my $fh, '<', $file
or die "Can't read '$file': $!\n";
my $out;
%{$out} = ( %{$out}, %{hashify_line( $_ )} ) for <$fh>;
close $fh or die "Can't close '$file': $!\n";
return $out;
}
sub hashify_line {
my $line = shift;
chomp $line;
$line =~ s/^\s+//;
return {} if ( $line =~ /^\#/ || $line =~ /^\s*$/ );
my ( $key, @values ) = split /\t/, $line;
return { $key => \@values };
}
</c>
<p>Some things to be aware of...
<ul>
<li>This reads the whole file list before doing anything (just like yours).</li>
<li>Writing it was more fun than absolutely necessary.</li>
</ul>
</readmore>
<p>As to your question, I think a [doc://grep] on [doc://readdir] would be the best way to go to get your list of files. The files themselves, you could process line-by-line instead of reading every line at once, and that might be better.
<p>You might be able to get the shell to do even more of the work for you, though.
<c>
my $dir = "/Path/to/a/data/directory";
my %hash;
open my $grep_fh, '-|', "grep '^' $dir/* /dev/null"
or die "Can't grep: $!\n";
while ( my $line = <$grep_fh> ) {
$line =~ s/^([^:])://;
my $file = $1;
next if $file =~ /\~$/;
%{$hash{$file}} = (%{$hash{$file}}, %{hashify_line( $line )});
}
close $grep_fh or die "Error closing grep pipe: $!\n";
</c>
<p>This way, you get grep and the shell to do all the I/O.
<p>Notes:
<ul>
<li>I hope you don't have any filenames with a colon in them.</li>
<li>This uses sub <c>hashify_line</c> as defined in the <readmore> above. (Hey, refactoring pays off!)</li>
<li>We assume also that <c>$dir</c> does not contain any shell metacharacters. If yours isn't really a literal as in your example, you may have to sanitize it.</li>
<li>You could probably get grep to do some of your line filtering for you, but I'd just as soon do that in Perl.</li>
<li>Likewise, you could use find and xargs to choose the file list to pass to grep, and I'd <em>really</em> rather do that in Perl.</li>
<li>Both of those "rather do that in Perl" statements may need to be reevaluated in light of performance problems. (For example, if you waste a lot of time ignoring lines.)</li>
</ul>
<p><strong>Update:</strong> [broomduster] makes a good point in [id://707269] also. If you have too many files, the "<c>$dir/*</c>" to the shell is going to bomb. Time for xargs, then. Something like:
<c>
my $cmd = "find $dir -type f -print | xargs grep '^' /dev/null";
open my $grep_fh, '-|', $cmd
or die "Can't grep: $!\n";
</c>
<p>Then you may have to filter out dot files somewhere.
707234
707234