comment on

Historically i wrote a programm, which also read in a large number of files into memory through setting $/ to undef. The given program took 45 minutes at the time to complete. After changing the reading mechanism to File::Slurp the runtime went down to 3 minutes. I did not change anything else. This will of course depend on a number of factors, but maybe you could give it a try. Your example adapted to File::Slurp:

use File::Slurp; # Update: or use File::Slurper which Athanasius menti
+oned.

opendir(DIR, $dir) or die "$!\n";
while ( defined( my $txtFile = readdir DIR ) ) {
    next if( $txtFile !~ /.txt$/ );
    $cnt++;

    my $data = read_file($fh);
    my ($channel) =  $data =~ /A\|CHNL_ID\|(\d+)/i;

    move ($txtFile, "$outDir/$channel") or die $!, $/;
}
closedir(DIR);
[download]

Since your programm is blocking while reading and moving the File, you might also want to parallelize it. E.g. with Parallel::ForkManager or with MCE. Then you can do the reading of the files and the moving in parallel. To some extend you are of course I/O bound, but i think it should still give you some improvement, if implemented correctly.
Update: i whipped up a quick (untested) example for Parallel::Forkmanager:

use strict;
use warnings;
use File::Slurp;
use Parallel::ForkManager;

sub read_next_batch_of_filenames {
        my ($DH, $MAX_FILES) = @_;

        my @files = ();
        while (my $fn = readdir $DH) {
                next if ($fn !~ m/\.txt\z/);
                push @files, $fn;
                last if (scalar(@files) >= $MAX_FILES);
        }

        if (@files) {
                return \@files;
        } else {
                return;
        }
}

sub move_files {
        my ($outDir, $files) = @_;

        foreach my $f (@$files) {
                my $data = read_file($f);
                my ($channel) =  $data =~ /A\|CHNL_ID\|(\d+)/i;

                move ($f, "$outDir/$channel") or die "Failed to move '
+$f' to '$outDir/$channel ($!)\n";
        }

}

sub parallelized_move {
        my $dir    = 'FIXME';
        my $outDir = 'FIXME';

        my $MAX_PROCESSES     =  4;   # tweak this to find the best nu
+mber
        my $FILES_PER_PROCESS = 1000; # process in batches of 1000, to
+ limit forking

        my $pm = Parallel::ForkManager->new($MAX_PROCESSES);

        opendir my $DH, $dir or die "Failed to open '$dir' for reading
+ ($!)\n";

        DATA_LOOP:
        while (my $files = read_next_batch_of_filenames($DH, $FILES_PE
+R_PROCESS)) {

                  # Forks and returns the pid for the child:
                  my $pid = $pm->start and next DATA_LOOP;

                  move_files($outDir, $files);

                  $pm->finish; # Terminates the child process
        }

        closedir $DH or die "Failed to close directory handle for '$di
+r' ($!)\n";
}
[download]

In reply to Re: Perl Program to efficiently process 500000 small files in a Directory (AIX) by rminner
in thread Perl Program to efficiently process 500000 small files in a Directory (AIX) by DenairPete

Are you posting in the right place? Check out Where do I post X? to know for sure.
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
Want more info? How to link or How to display code and escape characters are good places to start.


Keep It Simple, Stupid
	PerlMonks