Re: cat vs. file handle speed?

Well, someone had to do it right?
I haven't played with benchmarking much, but here is my contribution..

use Benchmark;

timethese (50000, {
    'OPENCAT'    => sub 
    {
        open (INFILE, "cat mbox |");
        while (<INFILE>){ #do nothing
                }
        close INFILE;
    },
    'OPENPERL' => sub 
    {
        open (INFILE, "mbox");
        while (<INFILE>){ #do nothing
                }
        close INFILE;
    }
});
[download]

Results:

Benchmark: timing 50000 iterations of OPENCAT, OPENPERL...
OPENCAT: 287 wallclock secs (175.33 usr 19.89 sys + 46.32 cusr 44.78 csys = 286.32 CPU) @ 256.12/s (n=50000)
OPENPERL: 171 wallclock secs (168.59 usr + 2.50 sys = 171.09 CPU) @ 292.24/s (n=50000)

I dropped "system" out of it early on - due to the fact that it was at about the above levels after only 1000 iterations :-)

Comment on Re: cat vs. file handle speed? Download Code

Replies are listed 'Best First'.

Re (tilly) 2: cat vs. file handle speed?
by tilly (Archbishop) on Mar 30, 2001 at 18:53 UTC

When I answered before I knew full well that any of the three could win, depending on OS, installed versions, hardware, files, etc. The reason why cat wins here is latency. In doing IO, every so often you may wind up waiting for your request to get sorted. Well with the pipe you can let cat do that waiting, and Perl can go on its merry way.

This has to be weighed against the fact that it takes more work to launch cat than it does to open a filehandle. Plus operating systems take some pains to do for every process what cat does for one. So the tradeoff is highly system specific.

The third option, slowest for you by a country mile, can win on very large files. Why? Well it turns out that Perl is faster to read STDIN than arbitrary filehandles. The third option arranges for Perl to be using STDIN. This has to be weighed against the fact that it takes a lot more work for Perl to be launched than cat.

Therefore in the right time and place, any of the three can win on raw speed.

But you should definitely go with the second. No doubt about it.

Why you ask?

Well it is the most portable answer, and with the second you can check failures and $! is populated correctly. This key information has been lost for the other 2. Besides which if you really ran out of performance, by using the second and then naively parallelizing by running a fixed number of copies on different files, you would get the best overall throughput.

There is exactly one circumstance where I have, or would, recommend something different. If you are on a system where Perl does not have large file support but cat does (this is now a compile-time option for Perl, but some systems may still fit that description) then the first option will allow Perl to work on files of size over 2 GB.

So the summary is that any of the three can win on raw performance, but for portability and error checking you really want to use the native method. (Which is the prioritization that I hinted at above. But you should not need to know all of this, that prioritization is usually right in the end.)

Any questions?

[reply]


more useful options
	PerlMonks