http://qs321.pair.com?node_id=692431


in reply to Re: 4k read buffer is too small
in thread 4k read buffer is too small

If you're only reading the file from beginning to end, another useful trick is to write a small program to read files in whatever blocksize you need (for example with sysread) and write them to standard output; then you can run that program and pipe its output to your actual program, which can read from the pipe in 4KB blocks without affecting how the NFS server is accessed. If you need to seek around this won't work, but sometimes it can be helpful.

Yes, strong agreement to this trick. My office neighbor also suggested this work-around, since we have at least 2 CPUs per node, and up to 8 CPUs per node, but most often, the actual computation only takes 1 CPU. CPU cycles are cheap!

As for the NFS client tuning, I will convey the message, but I suspect that the admins already did quite a bit of tuning. After all, our directory requests are served from a different physical machine than the data blocks. Myself, I don't have god privileges on any of the machines.

XXX:/export/samfs-XXX01 /auto/XXX-01 nfs rw,nosuid,noatime,rsize=32768 +,wsize=32768,timeo=15,retrans=7,tcp,intr,noquota,rsize=32768,wsize=32 +768,addr=10.125.0.8 0 0

The readahead sounds intriguing. How would it work, if 200 clients tried to read the same file, though slightly offset in start time? Wouldn't read-ahead aggravate the server load in this case?

Replies are listed 'Best First'.
Re^3: 4k read buffer is too small
by sgifford (Prior) on Jun 17, 2008 at 04:55 UTC
    XXX:/export/samfs-XXX01 /auto/XXX-01 nfs rw,nosuid,noatime,rsize=32768 +,wsize=32768,timeo=15,retrans=7,tcp,intr,noquota,rsize=32768,wsize=32 +768,addr=10.125.0.80 0
    Interesting, that should be reading in 32KB blocks. You would still see 4K blocks with strace, though, which might be throwing off your analysis. Try seeing if the output of nfsstat or tcpdump matches what you'd expect from strace. If you find that it actually is reading in larger blocks, your sysadmins can try increasing rsize further.

    Also, I seem to recall that you need NFSv3 to read blocks larger than 16K, so if you're not getting the full 32K you are asking for, you might want to look at that.

    The readahead sounds intriguing. How would it work, if 200 clients tried to read the same file, though slightly offset in start time? Wouldn't read-ahead aggravate the server load in this case?
    I'm not familiar with the internals of the Linux NFS code, but generally readahead will write into the buffer cache, and then client requests will be read from there. As long as it doesn't run out of memory it should do the right thing in the scenario you describe.