dino has asked for the wisdom of the Perl Monks concerning the following question:

Hi, what are the issues with perl file io and linux that make mutiple readlines so much slower than the equiv read? I've heard from a associate that perl file io is broken under linux, and wondered if it was true. Are there any known fixes appart from reinventing the wheel and replacing readline with the equiv reads? Is the linux kernel version relevant, and would using sfio make much difference? Regards dino

Replies are listed 'Best First'.
Re: perl io and linux issues
by jeroenes (Priest) on Mar 30, 2001 at 15:20 UTC
    For one thing, a readline must search for a newline in the file. Of course that is slower that a read, that just takes a predefined length of bytes from a file.

    Others will have more to say about the linux file IO. As far as I know, it is fast.

    "We are not alone"(FZ)

      I think the issue in question is that even running over a file, finding newlines, and forming lines, in the same way that readline should do, works faster in linux as a read of a block of data and subsequent line formation than as a readline.
      An initial answer to this has already been made by tye in the note here.

      I believe the question is asking more why Perl IO is broken on Linux, and what can be done to work around this in the meantime.

        After a little digging it appears not to be a problem with linux per se but glibc 2. (Please correct me if I'm wrong) Fast stdio io in perl uses the hooks _ptr and _cnt specified in the normal stdio.h. Unfortunately _cnt is not visible in the glibc 2 version. I've seen mentions of a glue.c file in glibc source that may allow a work around.

        If this is a glibc 2 then there may be future problems on other platforms, or do other platforms that use glibc 2 and have fast perl io, have a modified stdio.h?

        Perhaps sfio allows such direct hooks, has anybody else made progress on this issue?

        regards dino

      I can do a read, run a split over it to extract lines and it will still be faster than the equiv no of readlines under linux 2.2.x. Of course I have to do a bit of work to catch broken lines, buts thats little perf penalty. If I don't do a split and use s/// over the string its a lot faster