dorpus has asked for the wisdom of the Perl Monks concerning the following question:
open(INFILE,"cat textfile |") while(<INFILE>) {...} open(INFILE,"textfile") while(<INFILE>) {...} system("cat textfile | filter.pl")
|
---|
Replies are listed 'Best First'. | |
---|---|
Re: cat vs. file handle speed?
by Adam (Vicar) on Mar 30, 2001 at 06:06 UTC | |
This opens a type of file handle commonly known as a pipe. It spawns an additional process, complete with a duplicate set of environment variables and memory management requirements. The OS must now swap memory back and forth between Perl and cat. Perl opens a file handle directly to the file. No other processes are started. Perl invokes the shell which invokes cat and another instance of Perl! Plus the shell still has to open a file handle for the output of cat / the input to filter.pl Result: All three methods require a filehandle (aka a fileno, or a file descriptor) and two of the methods have the additional overhead of multiple processes. Use the second method and avoid all that. | [reply] [d/l] [select] |
Re: cat vs. file handle speed?
by the_slycer (Chaplain) on Mar 30, 2001 at 09:47 UTC | |
I haven't played with benchmarking much, but here is my contribution.. Results: Benchmark: timing 50000 iterations of OPENCAT, OPENPERL... OPENCAT: 287 wallclock secs (175.33 usr 19.89 sys + 46.32 cusr 44.78 csys = 286.32 CPU) @ 256.12/s (n=50000) OPENPERL: 171 wallclock secs (168.59 usr + 2.50 sys = 171.09 CPU) @ 292.24/s (n=50000) I dropped "system" out of it early on - due to the fact that it was at about the above levels after only 1000 iterations :-) | [reply] [d/l] |
by tilly (Archbishop) on Mar 30, 2001 at 18:53 UTC | |
When I answered before I knew full well that any of the three could win, depending on OS, installed versions, hardware, files, etc. The reason why cat wins here is latency. In doing IO, every so often you may wind up waiting for your request to get sorted. Well with the pipe you can let cat do that waiting, and Perl can go on its merry way. This has to be weighed against the fact that it takes more work to launch cat than it does to open a filehandle. Plus operating systems take some pains to do for every process what cat does for one. So the tradeoff is highly system specific. The third option, slowest for you by a country mile, can win on very large files. Why? Well it turns out that Perl is faster to read STDIN than arbitrary filehandles. The third option arranges for Perl to be using STDIN. This has to be weighed against the fact that it takes a lot more work for Perl to be launched than cat. Therefore in the right time and place, any of the three can win on raw speed. But you should definitely go with the second. No doubt about it. Why you ask? Well it is the most portable answer, and with the second you can check failures and $! is populated correctly. This key information has been lost for the other 2. Besides which if you really ran out of performance, by using the second and then naively parallelizing by running a fixed number of copies on different files, you would get the best overall throughput. There is exactly one circumstance where I have, or would, recommend something different. If you are on a system where Perl does not have large file support but cat does (this is now a compile-time option for Perl, but some systems may still fit that description) then the first option will allow Perl to work on files of size over 2 GB. So the summary is that any of the three can win on raw performance, but for portability and error checking you really want to use the native method. (Which is the prioritization that I hinted at above. But you should not need to know all of this, that prioritization is usually right in the end.) Any questions? | [reply] |
Re: cat vs. file handle speed?
by extremely (Priest) on Mar 30, 2001 at 05:48 UTC | |
-- | [reply] |
Re: cat vs. file handle speed?
by petral (Curate) on Mar 30, 2001 at 06:01 UTC | |
  (since we're on the subject of file-reading speed) p | [reply] |
Re (tilly) 1: cat vs. file handle speed?
by tilly (Archbishop) on Mar 30, 2001 at 06:02 UTC | |
None of them have error checks. Those issues are more important than the miniscule speed differences...
UPDATE | [reply] |
by Adam (Vicar) on Mar 30, 2001 at 06:08 UTC | |
Update | [reply] |
by tilly (Archbishop) on Mar 30, 2001 at 22:58 UTC | |
As for whether $! was a later argument, well I don't think so. You see I am in the habit of giving answers where you are unlikely to see the point of the answer unless you try it. If you try it you will discover that for yourself, and I believe that makes it stick better. If you do not try it, well my typing it wouldn't have helped because you would have just forgotten that as well. So yes, I was attempting to make it clear that dorpus was asking the wrong question. That was not an accident, that was the point. And I think that what I said does help our fellow monks. Why? Because it tells them what I think is important. I believe that if they value what I think is important here, that will be helpful. It may not be the help that was requested, but I am (in case you had not noticed) someone who tries to give the help that I think does the most good, even if it is not the help that was asked for. Furthermore when I first answered I gave concious thought to the question of whether I should answer the question as posed. You see I knew from the start that any of the three could beat the other two in practice. I sincerely thought about saying that up front, but I decided that it would obscure the critical point. And the critical point is that 99.9% of the time this is the wrong question to ask. The remaining 0.1% of the time, if you ask it and think carefully, you will come to the same answer that you would have come to if you had asked the right question in the first place. Therefore I thought it justified to only say what I considered to be key. Which is that until you reflexively get the syntax right and reach for the error check, it is more important to focus on those things than worrying about raw performance. Now if this undermines my credibility, then so be it. And continuing on, you may be different, but I used to claim that I never opened without a die in real code. But first I realized that if I showed my pseudo-code to others I wanted to put the die in so that they would not accidentally copy that. Then one day I caught myself missing that detail converting my own pseudo-code into production code. I then sat back, thought, and made the concious decision to always use it, even in pseudo-code, because I didn't want to accidentally pick up and use bad habits moving to production code. So YMMV, but what I do in pseudo-code I tend to do in production as well. So habits I want to have in my production code I try to stick to in pseudo-code. | [reply] |
by Adam (Vicar) on Mar 30, 2001 at 23:05 UTC | |
Re: cat vs. file handle speed?
by Malkavian (Friar) on Mar 30, 2001 at 15:49 UTC | |
In an ideal world (and most cases), Perl will beat cat in a file read. Caveat: If you're running on Linux, this isn't the case, and cat is actually faster. See the enlightened node by tye on this subject here. A minor work around (read ugly hack) to get Linux to work faster was to use a read statement, and break down the block into lines using a reader object. Seems to work for Linux, but will seriously slow down other OSes. Malk | [reply] |
by tye (Sage) on Mar 30, 2001 at 22:16 UTC | |
Well, my analysis applies if cat uses "stdio.h" to read the file (which probably depends on the breed of cat that you have). But that doesn't matter in this case because even if cat is faster than Perl, Perl would still have to read the output from cat. So X+Y is always bigger than just Y (since a process can't consume negative resources), whether X<Y or Y<X. - tye (but my friends call me "Tye") | [reply] [d/l] [select] |
Re: cat vs. file handle speed?
by indigo (Scribe) on Mar 30, 2001 at 05:55 UTC | |
| [reply] |