I ran your code on my Windows machine. Took 1 minute 34 seconds.
My implementation shown below.
I don't think that a huge BLOCKSIZE and using read() gained you anything. Because you immediately read all the data back out of memory, only to create a very large array of lines. Then read each line again in a loop. Having the 128MB buffer won't have much effect on the reading time of the disk. The data is typically organized in 4Kbyte hunks. On a physical drive, there will often be a mechanically induced delay after each "hunk" is read. I have a physical drive and even with it, total read time for the whole 75 MB file line by line is << 1 sec. SSD of course will be faster, but raw I/O speed doesn't appear to be the limit.
#!/usr/bin/perl
use strict;
use warnings;
use Time::Local;
my $out='out-perl.dat';
open my $OUT, '>', $out or die "unable to open $out !";
my $start;
my $finish;
foreach my $text_file (<*.txt>) {
print STDOUT "working on file $text_file\n";
$start = time();
open(my $IN, '<', $text_file) or die "invalid file: $text_file !";
# reading entire file line by line << 1 second overhead
while (<$IN>)
{
tr/-!"#%&'()*,.\/:;?@\[\\\]_{}0123456789//d;
s/w(as|ere)/be/gi;
s/\sneed.*/ need /gi;
s/\s.*meant.*/ mean /gi;
s/\s.*work.*/ work /gi;
s/\s.*read.*/ read /gi;
s/\s.*allow.*/ allow /gi;
s/\s.*gave.*/ give /gi;
s/\s.*bought.*/ buy /gi;
s/\s.*want.*/ want /gi;
s/\s.*hear.*/ hear /gi;
s/\s.*came.*/ come /gi;
s/\s.*destr.*/ destroy /gi;
s/\s.*paid.*/ pay /gi;
s/\s.*selve.*/ self /gi;
s/\s.*self.*/ self /gi;
s/\s.*cities.*/ city /gi;
s/\s.*fight.*/ fight /gi;
s/\s.*creat.*/ create /gi;
s/\s.*makin.*/ make /gi;
s/\s.*includ.*/ include /gi;
s/\s.*mean.*/ mean /gi;
s/\stalk.*/ talk /gi;
s/\sgoing / go /gi;
s/\sgetting / get /gi;
s/\sstart.*/ start /gi;
s/\sgoes / go /gi;
s/\sknew / know /gi;
s/\strying / try /gi;
s/\stried / try /gi;
s/\stold / tell /gi;
s/\scoming / come /gi;
s/\ssaying / say /gi;
s/\smen / man /gi;
s/\swomen / woman /gi;
s/\stook / take /gi;
s/\stak.*/ take /gi;
s/\slying / lie /gi;
s/\sdying / die /gi;
s/\smade /make /gi;
s/\sused.*/ use /gi;
s/\susing.*/ use /gi;
print $OUT "$_";
}
}
$finish = time();
my $total_seconds = $finish-$start;
my $minutes = int ($total_seconds/60);
my $seconds = $total_seconds - ($minutes*60);
print "minutes: $minutes seconds: $seconds\n";
__END__
working on file nightfall.txt
minutes: 1 seconds: 34