Re^3: Perl Read-Ahead I/O Buffering

4K (or whatever buffer size perl is using) is likely to be a pretty good size for an input buffer, based on lots of experience and tweaking among perl maintainers. It strikes a nice balance between competing resource demands -- a larger or smaller size might improve some things, but hinder others.

Your processing is going to be line-oriented anyway, and perl's internal buffering is already optimized (in C) to deliver lines while managing the underlying block-oriented buffering.

If you try doing the buffering yourself (e.g. using read as suggested in another reply), you'll end up slowing things down, because you have to write your own code to figure out the line boundaries, retain a buffer-final line fragment so that you can append the next buffer to that, and so on. It's not only slower to run, it's slower and harder to code, test and maintain.

If the runtime speed of the standard while (<>) loop in perl is a serious issue for your task, maybe you just need to use C. But then you'll spend even more time for coding, testing and maintenance. It's a question of whose time is more important and expensive: the programmer's, or the cpu's.

Comment on Re^3: Perl Read-Ahead I/O Buffering Download Code

Replies are listed 'Best First'.

Re^4: Perl Read-Ahead I/O Buffering (I/O speed)
by tye (Sage) on Oct 27, 2006 at 16:15 UTC

I agree that increasing the internal buffer size is unlikely to make much performance difference.

However, although I agree that this next statement should be true...

If you try doing the buffering yourself (e.g. using read as suggested in another reply), you'll end up slowing things down

in benchmarks on many systems, it can be twice as fast to read blocks with sysread and split them into lines using Perl code rather than letting perl's C code do the same work.

The culprit appears to be Perl's very old optimization based on peeking at the internal details about how stdio.h buffering is done. This optimization meant that on some systems, Perl code was sometimes faster at I/O than the equivalent C code.

However, the days of AT&T SVR4 have mostly passed and so most systems, even Unix ones, no longer meet Perl's definition of "STDSTDIO" and so on most systems the optimization was worked around and this resulted in I/O code that is (as near as I can tell) at least 4 times slower than it really should be. And replacing that working-around-an-old-optimization C code with similar Perl code makes I/O about twice as fast on Linux (last time I checked).

I suspect and hope that newer perls and the "PERLIO" layer stuff have resulted in this old cruft no longer causing such a slow-down, but it'd be interesting to see I/O benchmarks for recent versions of Perl.

- tye

[reply]


more useful options
	PerlMonks