dino has asked for the wisdom of the Perl Monks concerning the following question:
what are the issues with perl file io and linux that make
mutiple readlines so much slower than the equiv read? I've heard
from a associate that perl file io is broken under linux,
and wondered if it was true. Are there any known fixes
appart from reinventing the wheel and replacing readline
with the equiv reads? Is the linux kernel version relevant, and
would using sfio make much difference?
Re: perl io and linux issues
by jeroenes (Priest) on Mar 30, 2001 at 15:20 UTC
For one thing, a readline must search for a newline in the
file. Of course that is slower that a read, that just
takes a predefined length of bytes from a file.
Others will have more to say about the linux file IO.
As far as I know, it is fast.
"We are not alone"(FZ)
I think the issue in question is that even running over a file, finding newlines, and forming lines, in the same way that readline should do, works faster in linux as a read of a block of data and subsequent line formation than as a readline.
An initial answer to this has already been made by tye in the note here.
I believe the question is asking more why Perl IO is broken on Linux, and what can be done to work around this in the meantime.
After a little digging it appears not to be a problem with
linux per se but glibc 2. (Please correct me if I'm wrong)
Fast stdio io in perl uses the hooks _ptr and _cnt
specified in the normal stdio.h. Unfortunately _cnt is not
visible in the glibc 2 version. I've seen mentions of a
glue.c file in glibc source that may allow a work around.
If this is a glibc 2 then there may be future problems on
other platforms, or do other platforms that use glibc 2
and have fast perl io, have a modified stdio.h?
Perhaps sfio allows such direct hooks, has anybody else
made progress on this issue?
I can do a read, run a split over it to extract lines
and it will still be faster than the equiv no of readlines
under linux 2.2.x. Of course I have to do a bit of work to
catch broken lines, buts thats little perf penalty.
If I don't do a split and use s/// over the string its
a lot faster