4k read buffer is too small

voeckler has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: 4k read buffer is too small by almut (Canon) on Jun 16, 2008 at 21:52 UTC
AFAIK, stdio buffering - as configurable via `setvbuf` - is incompatible with PerlIO's buffering, which is why it's disabled when you configure Perl to use PerlIO. OTOH, you most probably do want PerlIO... so configuring/rebuilding Perl to not use it, isn't really an option. Anyhow, a little digging around suggests that you can "configure" PerlIO's buffer size in the file `perlio.c`: `STDCHAR * PerlIOBuf_get_base(pTHX_ PerlIO f) { PerlIOBuf const b = PerlIOSelf(f, PerlIOBuf); PERL_UNUSED_CONTEXT; if (!b->buf) { if (!b->bufsiz) b->bufsiz = 4096; /* <--- here / b->buf = Newxz(b->buf,b->bufsiz, STDCHAR); if (!b->buf) { b->buf = (STDCHAR ) & b->oneword; b->bufsiz = sizeof(b->oneword); } b->end = b->ptr = b->buf; } return b->buf; }` [download] At least, I changed that 4096 to 8192, recompiled perl (v5.10.0), and now `strace` reveals that `read(2)` is being called for blocks of size 8192, when you execute something like `open my $fh, "<", $^X or die; while (<$fh>) { }` [download] while before the change, read blocks were of size 4096. Other than that, I haven't done any testing yet. So, no guarantees whatsoever (!) that it'll work in every respect... — just something to play with at your own risk. Good luck!	[reply] [d/l] [select]
Re^2: 4k read buffer is too small by voeckler (Sexton) on Jun 17, 2008 at 04:15 UTC
Thank you, this sounds like what I was looking for. I was poking at the Perl code today. I will try this tomorrow. PS: Do you think the Perl gods will make a buffer setting function available again in PerlIO? After all, C has `setvbuf` and C++ has `myistream.rdbuf()->pubsetbuf(buf,bufsize)` to let the user override defaults, if he so choses.	[reply] [d/l] [select]
Re^3: 4k read buffer is too small by almut (Canon) on Jun 17, 2008 at 15:06 UTC
Do you think the Perl gods will make a buffer setting function available again in PerlIO? I can't really speak for the Perl gods, but considering that the configurability of the buffer size currently is near the lowest conceivable level¹, I'd think that making it user-settable (à la `setvbuf` with stdio) isn't prioritized very high at the moment. You might want to bring the issue up on p5p, however... if you feel determined and are well prepared with good arguments :) — I do remember having come across a related discussion (last time I felt like needing `setvbuf` myself), but unfortunately, I can't find it at the moment². I recall I did sense some reluctance to change in the overall tone of the thread... ___ ¹ "configurability levels" that I could think of: (1) hardcoded magic constant in the code (2) macro/constant (system-dependent) automatically determined during `configure` (3) compile-time `configure` option (4) user-configurable global runtime option affecting all buffers (switch, env-var, magic Perl var, whatever) (5) user-configurable runtime option per IO handle (like setvbuf) (6) user-configurable runtime option per PerlIO layer (7) like (6), but dynamically reconfigurable on open/unflushed handles ² googling the p5p archives - i.e. `'setvbuf site:www.xray.mpe.mpg.de'` - doesn't produce any hits, although there are definitely some mentions of `setvbuf` (presumably some restrictive robots.txt file)	[reply] [d/l] [select]
Re^4: 4k read buffer is too small by mr_mischief (Monsignor) on Jun 18, 2008 at 16:08 UTC
Re^2: 4k read buffer is too small by DrHyde (Prior) on Jun 17, 2008 at 10:08 UTC
That code surprises me. I would have at least expected it to be equal to the page size. And that varies with architecture. On Alpha, for instance, it's 8K.	[reply]
Re^3: 4k read buffer is too small by Steve_p (Priest) on Jun 17, 2008 at 11:55 UTC
It surprises me more that there's a magic number like that buried down in the core. It appears that you should be able to configure that in your own custom IO layer and set the size as big as you wish.	[reply]
Re^3: 4k read buffer is too small by voeckler (Sexton) on Jun 17, 2008 at 18:56 UTC
ACK: I would have expected a `getpagesize()` call, since pages are often natural boundaries. Or at least a reference to a `BUFSIZ` that many stdio's define - which happens to be, after several indirections, come to 8k on my x86_64 Linux.	[reply] [d/l] [select]
Re: 4k read buffer is too small by graff (Chancellor) on Jun 17, 2008 at 02:22 UTC
I'm curious what sort of trials lead you to say that using sysread "feels slow"... How slow does it "feel" compared to the default i/o methods? (How much do your colleagues notice your presence when you use sysread, as opposed to the default methods?) Since almut has already pointed out how to build perl 5.10 with your own custom input buffer size, that's likely to be the way to go -- a specific build of perl for this specific app... But I'd still be tempted to try a little more with the sysread approach (esp. since you seem to have made some progress with it already), and as for missing PerlIO's utf8 layer, well, you do still have Encode, which basically does the same thing. And I think it's worthwhile to consider starbolin's comment about improving the use of local disk in your optimization strategy -- in addition to anything else you do. Whatever solution you pick should probably include making a one-time copy of big data chunks to local disk, if only to keep your process from stalling everyone else on the network. (If the process happens to be modifying or rewriting file contents, all the more reason, perhaps, to work on local storage until the process is done, then "upload" the finished product to your network drives. NFS writes are more expensive than NFS reads, so the less you do NFS writes, the better.) update: To follow up on my remark about Encode, this snippet produces no warnings about "wide character data", and outputs the appropriate 3-byte sequence (two-byte utf8 sequence for á followed by LF): `perl -MEncode -e '$_=encode("utf8","\x{e1}\n"); syswrite(STDOUT,$_)'` [download] Doing the same thing on input involves passing your input string as the 2nd arg to `decode("utf8",...)` -- the return value is a perl utf8 string.	[reply] [d/l] [select]
Re^2: 4k read buffer is too small by voeckler (Sexton) on Jun 17, 2008 at 03:47 UTC
I agree that the data should go to the node's scratch, the processing happens, and results are uploaded to NFS again. Never mind NFS, RAID5 is also not helping writes. However, sometimes files become so ridiculously large that the local scratch does not suffice, and I am forced to work off NFS - though I still try to put the products on scratch, and upload them to NFS afterwards. I did write a simple FullyBuffered module basically doing sysopen, sysread into a large buffer, maintaining a cursor (to avoid unecessary string copies), etc. I was timing this against the original script using Perl's IO, and it performed, to my surprise, a little worse. The Perl IO version takes about 3 minutes for 2^20 lines, my fully buffered approached 4 minutes for 2^20 lines (dang, I tossed its log file). Of course, I do suspect that I am doing something stupidly inefficient. I thank you very much for showing me the proper utf8 conversions. Should I continue with my module approach, it will come in handy.	[reply]
Re: 4k read buffer is too small by quester (Vicar) on Jun 17, 2008 at 07:07 UTC
When sgifford mentioned tcpdump it reminded me: on a normal Ethernet segment, the MTU (maximum size of a packet) is customarily set to 1500 bytes. You could probably make the internal workings of NFS more efficient by raising the MTU, which will reduce the number of packets. Be wary of the following, though: (1) routing packets from a circuit with a large MTU to a circuit with a smaller one can cause occasional odd problems; for example distant web sites behind firewalls that block ICMP "path MTU exceeded" messages may not be able to send you pages more than 1500 bytes long any more. You may need to keep the NFS between your client and the servers on an isolated network to avoid this kind of problem. The client and servers could have interfaces to other networks in order to talk to your other equipment, as long as traffic isn't routed between the networks. (2) You need jumbo or giant frame support, which is only common on Gigabit and faster Ethernet. Note that there is no vendor-independent standard for exact how big a jumbo frame can be, but Cisco suggests 9216 bytes. It's very difficult to generalize about how much jumbo packets really help, because there is so much variation in how much of the overhead of breaking up data into multiple packets and then reassembling it can be offloaded onto dedicated hardware. But if you have the appropriate switches and NIC cards, it might be worth a quick benchmark. As a starting point, HP ran a benchmark of 9000 vice 1500 byte MTU on GigE, and showed around 43% better throughput and around 27% less CPU on the receive side and 43% less CPU on the transmit site.	[reply]
Re: 4k read buffer is too small by starbolin (Hermit) on Jun 16, 2008 at 20:30 UTC
voeckler writes: ... I kindly disagree that 4k is enough for everybody. I don't mean to be difficult but that's not what graff said. What he said was 4k a compromise that didn't adversely impact the implementation of perl for majority of users. I'm sure the 4k number is tied to the 'small' sbrk request size, so I'm think increasing beyond 4k is going to mean tweeking malloc. Perhaps also increasing the number of buffers used in parsing an input stream into line chunks. Now for one of those dumb, it's-not-my-budget, questions: Why not buy ( or ask for) a bigger local disk? 8GB is small now days. s//----->\t/;$~="JAPH";s//\r<$~~/;{s\|~$~-\|-~$~\|\|\|s \|-$~~\|$~~-\|\|\|s,<$~~,<~$~,,s,~$~>,$~~>,, $\|=1,select$,,$,,$,,1e-1;print;redo}	[reply]
Re^2: 4k read buffer is too small by voeckler (Sexton) on Jun 17, 2008 at 03:29 UTC
Concerning bying new disks: I was trying to make an example with numbers with the 8GB case, to show that it is already doing bad things to the NFS server. In actuality, the local disks on our machines permit 60 GB scratch each. However, some rule files are 123 GB (and 500 mio lines).	[reply]
Re: 4k read buffer is too small by sgifford (Prior) on Jun 17, 2008 at 03:40 UTC
You might want to look at your NFS client to see if it can be of any help. Readahead could help here a great deal without changing Perl; look at the `rsize` NFS option, and any other options you have in your NFS client. You will need to test by running `tcpdump` or looking at your NFS stats, since Perl will still be doing 4K reads, but the OS will be doing larger reads behind the scenes. If you're only reading the file from beginning to end, another useful trick is to write a small program to read files in whatever blocksize you need (for example with `sysread`) and write them to standard output; then you can run that program and pipe its output to your actual program, which can read from the pipe in 4KB blocks without affecting how the NFS server is accessed. If you need to seek around this won't work, but sometimes it can be helpful. -- sgifford's Web page	[reply] [d/l] [select]
Re^2: 4k read buffer is too small by voeckler (Sexton) on Jun 17, 2008 at 04:10 UTC
If you're only reading the file from beginning to end, another useful trick is to write a small program to read files in whatever blocksize you need (for example with sysread) and write them to standard output; then you can run that program and pipe its output to your actual program, which can read from the pipe in 4KB blocks without affecting how the NFS server is accessed. If you need to seek around this won't work, but sometimes it can be helpful. Yes, strong agreement to this trick. My office neighbor also suggested this work-around, since we have at least 2 CPUs per node, and up to 8 CPUs per node, but most often, the actual computation only takes 1 CPU. CPU cycles are cheap! As for the NFS client tuning, I will convey the message, but I suspect that the admins already did quite a bit of tuning. After all, our directory requests are served from a different physical machine than the data blocks. Myself, I don't have god privileges on any of the machines. `XXX:/export/samfs-XXX01 /auto/XXX-01 nfs rw,nosuid,noatime,rsize=32768 +,wsize=32768,timeo=15,retrans=7,tcp,intr,noquota,rsize=32768,wsize=32 +768,addr=10.125.0.8 0 0` [download] The readahead sounds intriguing. How would it work, if 200 clients tried to read the same file, though slightly offset in start time? Wouldn't read-ahead aggravate the server load in this case?	[reply] [d/l]
Re^3: 4k read buffer is too small by sgifford (Prior) on Jun 17, 2008 at 04:55 UTC
`XXX:/export/samfs-XXX01 /auto/XXX-01 nfs rw,nosuid,noatime,rsize=32768 +,wsize=32768,timeo=15,retrans=7,tcp,intr,noquota,rsize=32768,wsize=32 +768,addr=10.125.0.80 0` [download] Interesting, that should be reading in 32KB blocks. You would still see 4K blocks with `strace`, though, which might be throwing off your analysis. Try seeing if the output of `nfsstat` or `tcpdump` matches what you'd expect from `strace`. If you find that it actually is reading in larger blocks, your sysadmins can try increasing `rsize` further. Also, I seem to recall that you need NFSv3 to read blocks larger than 16K, so if you're not getting the full 32K you are asking for, you might want to look at that. The readahead sounds intriguing. How would it work, if 200 clients tried to read the same file, though slightly offset in start time? Wouldn't read-ahead aggravate the server load in this case? I'm not familiar with the internals of the Linux NFS code, but generally readahead will write into the buffer cache, and then client requests will be read from there. As long as it doesn't run out of memory it should do the right thing in the scenario you describe. -- sgifford's Web page	[reply] [d/l] [select]
Re^2: 4k read buffer is too small by voeckler (Sexton) on Jun 17, 2008 at 20:32 UTC
... to write a small program to read files in whatever blocksize you need ... It just occurred to me: The small program is called dd: `dd if=largefile ibs=8M \| perl ... \| dd of=newfile obs=8M`	[reply] [d/l]
Re^3: 4k read buffer is too small by aufflick (Deacon) on Jun 20, 2008 at 10:04 UTC
genius!	[reply]
Re: 4k read buffer is too small by starbolin (Hermit) on Jun 17, 2008 at 02:45 UTC
Dumb question: What command are you using to measure read sizes? I'm asking cause I've been playing with iostat, perl, and large files and I'm seeing reads from the disk at 16KB which is FreeBSD's buffer size. So I'm thinking the bottleneck may be in the NFS drivers and not in perl?? Someone correct my thinking here. s//----->\t/;$~="JAPH";s//\r<$~~/;{s\|~$~-\|-~$~\|\|\|s \|-$~~\|$~~-\|\|\|s,<$~~,<~$~,,s,~$~>,$~~>,, $\|=1,select$,,$,,$,,1e-1;print;redo}	[reply] [d/l]
Re^2: 4k read buffer is too small by voeckler (Sexton) on Jun 17, 2008 at 03:59 UTC
I wanted to know the number of read(2) calls, so I used `strace -e read perl ...` Each of these reads hits the kernel's VFS, as they go from userland to kernelland. According to the admins, each read will incur an NFS request to the server. Too many simultaneous requests will topple the server. Less NFS requests, as generated by a larger buffer reads, are friendlier to the server; even, if they are not necessarily speeding up my program.	[reply] [d/l]
Re^3: 4k read buffer is too small by starbolin (Hermit) on Jun 17, 2008 at 16:44 UTC
I think your admins are lying to you. The NFS block size is determined when you isssue mount to tie the NFS driver into your file system. Just by co-incidence the default block size is also 4k. The NFS block size determines when and how much data is requested from the server not the application's IO block size. See your systems mount manual page. After doing just a tiny bit of reading and a little bit of testing on my system I'm convinced that modifying perl's block size would be a wasted effort. It would not change the size of the NFS requests to the server. s//----->\t/;$~="JAPH";s//\r<$~~/;{s\|~$~-\|-~$~\|\|\|s \|-$~~\|$~~-\|\|\|s,<$~~,<~$~,,s,~$~>,$~~>,, $\|=1,select$,,$,,$,,1e-1;print;redo}	[reply]
Re^4: 4k read buffer is too small by voeckler (Sexton) on Jun 18, 2008 at 17:11 UTC
Re^5: 4k read buffer is too small by starbolin (Hermit) on Jun 18, 2008 at 18:06 UTC


Pathologically Eclectic Rubbish Lister
	PerlMonks