http://qs321.pair.com?node_id=801401

bloonix has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks, I need your wisdom.

A lot of children writes its status to a file. I'm using
LOCK_EX and LOCK_SH to avoin concurrency. I'm using
sysopen, syswrite, sysseek.

Ok, now my problem (see below the strace):
lseek(5, 147, SEEK_SET) = 147 gettimeofday({1255627303, 624276}, NULL) = 0 gettimeofday({1255627303, 624380}, NULL) = 0 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2309, ...}) = 0 gettimeofday({1255627303, 624572}, NULL) = 0 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2309, ...}) = 0 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2309, ...}) = 0 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2309, ...}) = 0 stat("/var/log/bloonix/bloonix-server.log", {st_mode=S_IFREG|0640, st_ +size=154119, ...}) = 0 stat("/var/log/bloonix/bloonix-server.log", {st_mode=S_IFREG|0640, st_ +size=154119, ...}) = 0 flock(3, LOCK_EX) = 0 write(3, "Oct 15 19:21:43 [WARNING] (0.0013"..., 61) = 61 flock(3, LOCK_UN) = 0 flock(5, LOCK_EX) = 0 lseek(5, 0, SEEK_CUR) = 808 gettimeofday({1255627303, 625664}, NULL) = 0 gettimeofday({1255627303, 625771}, NULL) = 0 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2309, ...}) = 0 gettimeofday({1255627303, 625961}, NULL) = 0 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2309, ...}) = 0 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2309, ...}) = 0 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2309, ...}) = 0 stat("/var/log/bloonix/bloonix-server.log", {st_mode=S_IFREG|0640, st_ +size=154180, ...}) = 0 stat("/var/log/bloonix/bloonix-server.log", {st_mode=S_IFREG|0640, st_ +size=154180, ...}) = 0 flock(3, LOCK_EX) = 0 write(3, "Oct 15 19:21:43 [WARNING] (0.0013"..., 139) + = 139 flock(3, LOCK_UN) = 0 write(5, "cl "..., 4) = 4 lseek(5, 0, SEEK_CUR) = 1020
I want to jump to position 147 to write 2 bytes to the
file - that seems to work - but then the 2 bytes are
written at position 808.

I am really confused.

Short:

lseek(5, 147, SEEK_SET) = 147 flock(5, LOCK_EX) = 0 lseek(5, 0, SEEK_CUR) = 808 write(5, "cl "..., 4) = 4 lseek(5, 0, SEEK_CUR) = 1020
My code:
# Example: $self->_sysseek(147); $self->_syswrite("cl", 4); sub _sysseek { my ($self, $pos) = @_; my $fh = $self->{fh}; my $cur_pos = sysseek($fh, 0, SEEK_CUR); if (!defined $cur_pos) { $cur_pos = ""; } warn "+ $$ $pos $cur_pos" if $$ != PROCESS_ID && $WRITE; while ( 1 ) { $cur_pos = sysseek($fh, $pos, SEEK_SET); if (!defined $cur_pos) { die "system seek error: $!"; } if ($cur_pos == $pos) { last; } $DEBUG = 1; warn "unable to seek to pos $pos (curpos $cur_pos), try again" +; } warn "- $$ $pos $cur_pos\n" if $$ != PROCESS_ID && $WRITE; } sub _syswrite { my ($self, $data, $size) = @_; my $fh = $self->{fh}; my $offset = 0; my $length = length($data); if ($length < $size) { $data .= $SEPARATOR x ($size - $length); } flock($fh, LOCK_EX) or die "unable to lock file"; while ($size) { my $pos = sysseek($fh, 0, SEEK_CUR); warn "start to write at pos $pos" if $WRITE; my $written = syswrite $fh, $data, $size, $offset; if (!defined $written) { die "system write error: $!"; } elsif ($written) { $size -= $written; $offset += $written; $pos = sysseek($fh, 0, SEEK_CUR); warn "written $written newpos $pos" if $WRITE; } } flock($fh, LOCK_UN) or die "unable to unlock file"; }
Please ignore the lines for debugging.

Any ideas to my problem?

Cheers

Update:
titel updated
Code updated

Replies are listed 'Best First'.
Re: sysseek and syswrite fails
by jakobi (Pilgrim) on Oct 15, 2009 at 19:22 UTC

    Things you should test:

    • Double check that you're indeed using unbuffered IO everywhere.
    • If I understand you correctly, you're forking children to read:
      • use strace -f instead for the _real_, _complete_ trace.
      • consider to explicitly open the files separately in each child (my bet - fixes issues both for the pointer AND for the locking).

      • keywords: perldoc -f fork, man 2 fork, file-descriptor/file-handle, race-condition

    Concerning inherited filehandles and the file pointer

    Note that I use open/print with autoflush instead due to laziness.

    1> perl -e 'use IO::Handle; open(FH,">out"); FH->autoflush(1); if(fork){ print FH "aaaa"; sleep 2; print FH "bbb"; close FH }else{ sleep 1; print FH "1111"; seek FH,0,0; print FH "4"; close FH }'

    Inherited file handle is shared between parent and child, thus we should find 4bbb1111 in ./out on a reasonably idle host.

    2> perl -e 'use IO::Handle; open(FH,">out"); FH->autoflush(1); if(fork){ print FH "aaaa"; sleep 2; print FH "bbb"; close FH }else{ close FH;open(FH,"+<out") or warn "err"; FH->autofl +ush(1); sleep 1; print FH "1111"; seek FH,0,0; print FH "4"; close FH1 }'

    Same file but opened separately: 4111bbb - the pointer is no longer shared.

    Concerning locking

    • Perl's flock flushes the handle ( not an issue if you use sys*/... consistently :))
    • Perl's flock uses one of flock/lockf/fcntl as base mechanism, which have somewhat differing semantics ranging from required read or write modi to possible nfs issues. While Perl's choice is usually sane, I'd still prefer to simplify what I'd use flock on: Here I'd suggest to try explicit independent open's of the file we try to lock. Consider writing a simplified test case for the locking with warn() on either side of the flock plus another set around the close()(/unlock). Note that unlocking with other than close() may allow a race when using buffered IO.
    • a quick test of flock wrt shared / separate FD:
      • (got me again:( ) strictures and checking the flock return: A missing qw/LOCK_EX/ when using use POSIX is a nice way to waste some time when omitting use strict;use warnings; for a quick test. check that LOCK_EX==2.
      • Results are just as my finely honed paranoia suspected: flock() works in case 2, but is "idempotent" as long as the same file descriptor is used (case 1). man 2 lock offers this explanation for the situation on Linux (double-check for POSIX/your OS...):
               A single file may not simultaneously have both shared and exclusive locks.
        
               Locks created by flock() are associated with an open  file  table  entry.   This
               means  that  duplicate  file  descriptors  (created  by, for example, fork(2) or
               dup(2)) refer to the same lock, and this lock may be modified or released  using
               any  of  these  descriptors.   Furthermore,  the  lock  is released either by an
               explicit LOCK_UN operation on any of these duplicate descriptors,  or  when  all
               such descriptors have been closed.
        
               If  a  process  uses open(2) (or similar) to obtain more than one descriptor for
               the same file, these descriptors  are  treated  independently  by  flock().   An
               attempt  to lock the file using one of these file descriptors may be denied by a
               lock that the calling process has already placed via another descriptor.
        
               A process may only hold one type of lock (shared or exclusive) on a file.   Sub‐
               sequent flock() calls on an already locked file will convert an existing lock to
               the new lock mode.
        

    HTH
    Peter

      Hi Peter,

      >>Double check that you're indeed using unbuffered IO everywhere. I am using sysseek, syswrite, sysread.
      >>If I understand you correctly, you're forking
      >>children to read:

      Yes. Each child write its current status to a file that looks like a table.

      >>use strace -f instead for the _real_, _complete_
      >>trace.

      The above strace output was generated with -f for each child.


      >>consider to explicitly open the files separately in
      >>each child (my bet - issues both for the pointer
      >>AND for the locking).

      Hmmm, okay, that could be the solution. The parent is forking and the children uses all opened handles of the parent.

      I test it! Thanks a lot for the suggestions!

      Cheeers,
      Jonny Update: removed small tags

      Hi Peter, it works!

      I just reopened the file and first lock the file and then seek. The daemon runs since 1 hour and it seems to be solved!

      Cheers and good night.
      Jonny