I have a single file that ranges daily from 45-50Gig on a Solaris 8 server with 16G ram and 8 900Mhz cpu's.
How can you not be taking advantage of that horsepower?
I would seriously look into MPICH's implementation ROMIO
http://www-unix.mcs.anl.gov/romio/ (MPI Standard 2.0)
http://www.mpi-forum.org/docs/mpi-20-html/node171.htm#Node171
If it has to be Perl, then I would certainly look into parallelizing this application - as a brute force approach, split the file 8 ways, run a process to take of each piece, then join the darn things back together. Even with the splitting and rejoining, I am sure it would be faster than what is happening right now.