http://qs321.pair.com?node_id=1184978


in reply to Performance In Perl

Ignore the performance part of your code for now, you got plenty of good answers. But nobody has yet mentioned a much deeper problem:

my $errorlog=''; if (...) { $errorlog.="Oops\n"; } if (...) { $errorlog.="Oh noes\n"; } foo(...); if (...) { $errorlog.="Outch\n"; } if ($errorlog ne '') { print STDERR $errorlog; } else { say 'All done, bye.'; }

This is your basic idea, right?

Now imagine that this (pseudo-)script crashes seemingly randomly. Look into your error log. You see NOTHING. The script MUST NOT crash until the very end to write the error log. Unfortunately, crashes and bugs usually ignore such rules, and occour anywhere in your code.

Now, let's do it right. Don't print to STDERR, use warn and die as intended. Yes, both finally write to STDERR, but you can catch both if you want (eval, $SIG{__DIE__}, $SIG{__WARN__}). But that's not the point. The point is that STDERR is unbuffered. Everything you write there ends in the log, ASAP. So:

# We don't need that. It's just wrong: my $errorlog=''; if (...) { warn 'Oops'; # Oh, by the way: Unless you add "\n", you will see f +ile and line in the log. } if (...) { warn 'Oh noes'; } foo(...); if (...) { warn 'Outch"; } say "All done, bye.";

Now, the last thing you see in the log is "Oh noes at example.pl line 20". And as it turns out in my example. the condition leading to this is a check to work around a known bug in an XS module that is triggered by a certain combination of input data to foo().

And that's why that three lines should read:

if (...) { die 'Input data is f*cked up beyond repair. Died to prevent a cras +h'; }

And now, buffered vs. unbuffered.

Perl files are usually buffered, and if only because the libc below perl buffers. Even if you spoon-feed a file character by character, perl and/or libc will usually buffer that until either the buffer is full or perl/libc decides that it's time to flush the buffer. As long as only the buffer is written, everything should be quite fast. Everything happens in memory in userspace. When the buffer is flushed, libc issues a syscall to actually write the file. The syscall switches to kernel mode, which is expensive, and the kernel does a lot of stuff to really write the file. This takes significantly more time.

An unbuffered file still uses a buffer, but it is automatically flushed after each write command. The syscall happens for every write command. This will obviously be slower than a buffered file, especially if you write character by character. But because the buffer is flushed after every write, a following crash in user space does not affect the log file. It has already been written.

What you have writen here is another buffering layer that is flushed only once, at the very end of the program. Does that improve the performance? Maybe a tiny bit. It blocks RAM that could be used for better purposes. This may become significant if you append lots of data to the buffer.

What does actually happen? Is writing the error log really the bottle neck? You can find that out. Devel::NYTProf is an excellent tool that shows you where your code really spends its time. That's where you really want to start optimizing.

Alexander

--
Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)