Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

Is print faster than syswrite?

by saurabh.hirani (Beadle)
on Jan 27, 2009 at 13:22 UTC ( [id://739181] : perlquestion . print w/replies, xml ) Need Help??

saurabh.hirani has asked for the wisdom of the Perl Monks concerning the following question:

Hi guys,

I am working on a logging module. I was just studying the existing perl log modules and read that - syswrite is useful in a condition like logging as during its write operation it prevents someone else from writing at the same time. To verify that I wrote a small perl program

#!/usr/bin/perl -w use strict; use Fcntl; $| = 1; sub testme { my ($flag, $data) = @_; if ($flag == 1) { sysopen(FILE, 'bangbang', O_WRONLY | O_APPEND | O_CREAT); while (1) { syswrite FILE, "$data\n"; } } else { open(FILE, ">>bangbang"); while (1) { print FILE "$data\n"; } } close(FILE); } testme($ARGV[0], $ARGV[1]);

Testing both for syswrite and print - I found that $data printing in file is corrupted when I run 2 instances of the same program together. But using syswrite saves me from that. It has every line intact and no line is a mix of different data sets of the 2 programs.

But what struck me more was the size of the files created when I ran the following programs together:


1. ./program 1 first
2. ./program 1 second

this uses syswrite to write and check whether 'first' and 'second' clobber each other. During this run, I got the following stats:


1st run - 30 seconds - file size 40 MB
2nd run - 50 seconds - file size 61 MB

Now for the print run,


1. ./program 2 first
2. ./program 2 second

this uses print to write and check whether 'first' and 'second' clobber each other. During this run, I got the following stats:


1st run - 30 seconds - file size 315 MB
2nd run - 50 seconds - file size 586 MB

I was running a VM so I know that my profiling wouldn't be so accurate. But still, I had done autoflush for both syswrite and print. And I read somewhere that for output greater than 3 KB syswrite is faster?

Is print faster than syswrite? Is my testing flawed? Am I missing something? What would you use if you had to write a logging module in perl - print or syswrite? and why?


Thanks for going through my post.

Replies are listed 'Best First'.
Re: Is print faster than syswrite?
by ikegami (Patriarch) on Jan 27, 2009 at 13:54 UTC
    $| = 1; only affects STDOUT, not FILE. That accounts for the speed difference and the clobbering.

    Add FILE->autoflush(1); after the open statement. (Don't forget to use use IO::Handle;) to bring print inline with syswrite.

      Add FILE->autoflush(1); after the open statement. (Don't forget to use use IO::Handle;) to bring print inline with syswrite.

      Thanks. That is really useful information. I thought $| = 1 affects buffering of print function wherever it is used. I will try it out.
        $|=1; only affects the currently selected file handle, which is STDOUT by default.

      Used FILE->autoflush(1). That stopped the clobbering and bought print to the level of syswrite. In 30 seconds both of them created a file of around 30 MB.

      Actually, it acts on the currently select()d filehandle, which is generally STDOUT, but not always. That's why you still sometimes see the convoluted select((select(STDOUT),$|=1)[0]);

      --
      In Bob We Trust, All Others Bring Data.

        I know, I already said that. I wasn't making a general statement, I was talking about the variables in his program (STDOUT and FILE).

        And that convoluted bit? That's how autoflush does it too. (Well, a slightly safer variation.) I prefer to use autoflush cause it's way more readable.

Re: Is print faster than syswrite?
by JavaFan (Canon) on Jan 27, 2009 at 13:43 UTC
    print is the system library, which uses a buffer. So if you write using print() you will less often access the disk - and hence, wait far less for it to finish.

    syswrite access the disk each time you write. This prevents the clobbering you see (and makes it less likely to lose data if your program unexpectly dies), but you pay the price you see.

      print is the system library, which uses a buffer.

      Which is defeated by closing the file after every print. ( Missed the infinite loop )

Re: Is print faster than syswrite?
by JavaFan (Canon) on Jan 27, 2009 at 14:25 UTC
    What would you use if you had to write a logging module in perl - print or syswrite?
    Neither. For logging, I use the syslog service. That allows configuration of what to log where without having to modify the program, or figuring out what command line options to use. It also makes it easy to send the logs elsewhere.

    Why reinvent the wheel?

      Thanks a lot for your responses guys. I appreciate your viewpoints.

      I have used syslog till now for all of my work and it works like charm. But we wanted to have a logging module which we can customize from ground up and through which we can do the following:


      1. log to file or database directly. I know that logging to a database would be a major I/O hit so read on.
      2. log to a file using standard templates. As in I should be able to make a call like
      $logobj(245, {'name' => 'saurabh', 'game' => 'tennis'})

      That would check whether 245 is a valid message template id in a db file and if it is log a message to the log file of the format - program_name|program_action|tmpl_id|tmpl_params

      This log file output would be uploaded to the database on a regular basis (better than doing the database directly) and then reporting would be a much easier job.


      3. We wanted logging on a per program basis and not a facility, priority basis.
      4. We wanted to enforce the method of logging to be extremely uniform. It is better to use templates than to assume that the next guy will surely follow the "Failed to open file $file" rather than "Opening of file $file failed"

      As far as reinventing the wheel goes, I had to - as the wheel of a jeep cannot be used in a Jaguar. :)
      I haven't used this work till now in production environment. So will see how it goes.
Re: Is print faster than syswrite?
by jh- (Scribe) on Jan 27, 2009 at 13:50 UTC

    I would imagine print is faster on this instance because it doesn't have to wait for the file to be free for writing unlike syswrite. I would've imagined the size difference to be about half though, not that much.

    My logging style usually depends on the program I'm working on. If I'm not sure the file isn't being written to from somewhere else (i.e. in multi-threaded applications), I usually use syswrite. For simple single-threaded applications I usually use print because IMO it looks cleaner.

Re: Is print faster than syswrite?
by zwon (Abbot) on Jan 27, 2009 at 20:09 UTC
      I would use Log::Log4perl. Take a look at this bunch of logging modules on cpan. Do you think your wheel would be better?


      Thanks for the links. I did see Log::Log4perl. Its pretty intense. I will try it out. I don't know whether my wheel would be better. Thanks for showing me around the current section of available wheels.