Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

Re^2: reading from a file after a seek isn't working for me

by jakobi (Pilgrim)
on Oct 21, 2009 at 20:40 UTC ( [id://802529]=note: print w/replies, xml ) Need Help??


in reply to Re: reading from a file after a seek isn't working for me
in thread reading from a file after a seek isn't working for me

@ikegami: Fellow Monks, can you please explain in detail the need for the explicit close here?

Normally opening with an existing FH closes the original file or at least I never noticed a problem in cutting this corner in one-shots, one-liners or inline shell scripts (but usually avoiding read, sysread, tty's and STDIN/OUT/ERR).

Thanx,
Peter

Update: - ok, any takers for this riddle with more time? Will summarize if pointed correctly with keywords and RTFM's to check :)

From perldoc -f close:

You don’t have to close FILEHANDLE if you are immediately going to do another "open" on it, because "open" will close it for you. (See "open".) However, an explicit "close" on an input file resets the line counter ($.), while the implicit close done by "open" does not.

There are a few more notes on pipes, but those don't seem to match the opener's situation either. Skimming perlopentut I didn't see pointers of interest - au contraire, it even _seems_ to imply that reopening w/o close (my reading on the lack of close() in the Playing with STDIN/STDOUT section) for STDIN/STDOUT is fine. Or is there indeed some hardcoded magic of it being STDOUT we insist to read from??

What do I miss?

  • Comment on Re^2: reading from a file after a seek isn't working for me

Replies are listed 'Best First'.
Re^3: reading from a file after a seek isn't working for me
by almut (Canon) on Oct 21, 2009 at 21:44 UTC
    can you please explain in detail the need for the explicit close here?

    I think it has to do with PerlIO in combination with an implementation peculiarity.

    When you compare the straces of both variants, you'll see something like:

    # with explicit close close(1) = 0 open("/tmp/stdout.log", O_RDWR|O_CREAT|O_TRUNC, 0666) = 1 ioctl(1, SNDCTL_TMR_TIMEBASE or TCGETS, 0x7fff2ba69a30) = -1 ENOTTY (I +nappropriate ioctl for device) lseek(1, 0, SEEK_CUR) = 0 fstat(1, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 fcntl(1, F_SETFD, 0) = 0 # without explicit close open("/tmp/stdout.log", O_RDWR|O_CREAT|O_TRUNC, 0666) = 4 ioctl(4, SNDCTL_TMR_TIMEBASE or TCGETS, 0x7fff023c1390) = -1 ENOTTY (I +nappropriate ioctl for device) lseek(4, 0, SEEK_CUR) = 0 fstat(4, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 dup2(4, 1) = 1 close(4) = 0 fcntl(1, F_SETFD, 0) = 0

    Now, the issue is (I think) that although the dup2 does create a copy of fd 4 as fd 1 at the system level (and in fact does also close the old fd 1), it does not copy the PerlIO part, which is only being handled properly, when the filehandle is being created directly using Perl's open.  For this reason, the filedescriptor is considered invalid from the PerlIO point of view (—> the "Bad file descriptor" message). This is checked at the beginning of Perl's read using PerlIOValid(f)1 (even before doing any read system call).

    Don't ask (me), however, why the indirect dup2-technique is being used in the first place instead of simply closing the filedescriptor before the open...  (Presumably, it did work before the the introduction of PerlIO, and might just not have been adapted appropriately since.)

    ___

    1  see perlio.c:

    #define Perl_PerlIO_or_Base(f, callback, base, failure, args) \ if (PerlIOValid(f)) { \ const PerlIO_funcs * const tab = PerlIOBase(f)->tab;\ if (tab && tab->callback) \ return (*tab->callback) args; \ else \ return PerlIOBase_ ## base args; \ } \ else \ SETERRNO(EBADF, SS_IVCHAN); \ return failure ... SSize_t Perl_PerlIO_read(pTHX_ PerlIO *f, void *vbuf, Size_t count) { Perl_PerlIO_or_Base(f, Read, read, -1, (aTHX_ f, vbuf, count)); }

      Thanx for the pointer, almut. That dup & perlio scrap is interesting.

      But there must be more to it than that, as I don't see any special treatment for STDOUT in the perlio.c scrap (neither for the numeric FD's 0 to 2):

      I was playing with the scrap below in the meantime.

      I dupped SAVOUT on STDERR instead / simplifying system to printing / using autoflush / opening STDOUT myself to /dev/tty first: no change.

      This however is interesting:

      Changing the name of the handle STDOUT <=> ANYTHINGeLSE manages to act as a toggle for the problem. Furthermore, w/o close, the tell on the STDOUT file pointer at begin prints 19 in the example below (might be due to the handle earlier being a tty, and something didn't quite catch the change to a plain file w/o explicit close?). Any other handle name prints 5 regardless of close or no close.

      So it looks like we have some hard-coded STDOUT-related magic somewhere in the guts of PERLIO or even lower, with probably STDIN/ERR offering similar peculiarities.

      Given that too much in Perl, esp wrt <> and stdio is magic, it's probably a good idea to say strictly outside any possibly dusty corner whose smell is faintly related to something magic. Which in this case might just be the idea of reusing a special handle, and worse, reading from it.

      How to classify this behaviour: What doc/code do we still miss? Or is this indeed, say, an easy-to-fix oversight in the documentation? Or is it a somewhat larger actual bug?

      Still wondering (& vowing to step even more cautiously anywhere near STDIO magic),
      My thanx to almut & ikegami for the work below!
      less confused now (& busy scribbling away two new-to-me debugging tips along with a link to their demonstration here)
      Peter

        ...as I don't see any special treatment for STDOUT in the perlio.c scrap

        Just to be clear: the perlio.c snippet was only meant to show where the PerlIOValid() check happens for the read.  The decision between using a direct close vs. the indirect dup2, OTOH, is more likely to happen in Perl's open implementation (which I didn't yet have time to wade through — it's rather lengthy...  and for a low-depth explanation I figured the manifestation of the difference in the strace should be sufficient evidence).

      For this reason, the filedescriptor is considered invalid from the PerlIO point of view

      But it's not, or at least not completely invalid. You can still seek using the handle and print to the handle without problem. For example, adding

      seek(STDOUT, -0, 2) or die $!; print STDOUT "abc\n";

      does indeed append "abc\n" to the file.

      It's more like Perl remembers the handle's original mode and doesn't realize it can read from it now.

      Update: I did a bit of Dumping and stracing of my own.

      There's is no difference in the IO objects. I'm now with you leaning towards a PerlIO problem.

      Seems that the "Bad file descriptor" message originates from Perl, not the system. Perl doesn't even attempt to read from STDOUT.

      $ cat a.pl use Devel::Peek; open(SAVOUT, '>&STDOUT') or die $!; close(STDOUT) if $ARGV[0]; open(STDOUT, '+>', "/tmp/stdout.log") or die $!; Dump(*STDOUT{IO}); @argv = qw(/bin/echo hello world); system(@argv); print SAVOUT "before=", tell(STDOUT), "\n"; seek(STDOUT, 0, 0) or die $!; print SAVOUT "after=", tell(STDOUT), "\n"; while (1) { my $rv = read STDOUT, $_, 8192; die $! if !defined($rv); last unless $_; print SAVOUT "stdout=", $_; } print SAVOUT "at end=", tell(STDOUT), "\n"; close STDOUT; $ diff -u <(strace perl a.pl 0 2>&1) <(strace perl a.pl 1 2>&1) | less ... lseek(1, 0, SEEK_SET) = 0 lseek(1, 0, SEEK_CUR) = 0 -[ code to read locale-dependent version of error message] -write(2, "Bad file descriptor at a.pl line"..., 37Bad file descriptor + at a.pl line 17. -) = 37 +read(1, "hello world\n", 4096) = 12 +read(1, "", 4096) = 0 +close(1) = 0 -write(3, "before=0\nafter=0\n", 17before=0 +write(3, "before=0\nafter=0\nstdout=hello wo"..., 46before=0 after=0 -) = 17 +stdout=hello world +at end=12 +) = 46 close(3) = 0 -exit_group(9) = ? -Process 4028 detached +exit_group(0) = ? +Process 4032 detached
        But it's not, or at least not completely invalid. ...

        Good point.  Actually, when taking a closer look, I think Perl sets EBADF one routine further down in PerlIOBase_read() (which is being called from the macro Perl_PerlIO_or_Base),  in case the PERLIO_F_CANREAD flag isn't set:

        PerlIOBase_read(pTHX_ PerlIO *f, void *vbuf, Size_t count) { STDCHAR *buf = (STDCHAR *) vbuf; if (f) { if (!(PerlIOBase(f)->flags & PERLIO_F_CANREAD)) { PerlIOBase(f)->flags |= PERLIO_F_ERROR; SETERRNO(EBADF, SS_IVCHAN); return 0; } ...

        It's more like Perl remembers the handle's original mode and doesn't realize it can read from it now.

        Yes, and that's most likely because the dup2 doesn't copy the perl-internal PERLIO* flags (well, how should it, it knows nothing about them).

        The following snippet shows that the two STDOUTs modes differ depending on whether STDOUT is explicitly being closed first:

        (I made use of Inline::C because I couldn't find a way to call PerlIO_modestr() directly via plain Perl)

        #!/usr/bin/perl use Inline C; close STDOUT if $ARGV[0]; open(STDOUT, '+>', "/tmp/stdout.log") or die $!; dumpmode(STDOUT); __END__ __C__ void dumpmode(SV* fh) { char buf[10]; PerlIO *f = IoIFP(sv_2io(fh)); PerlIO_modestr(f, buf); fprintf(stderr, "mode = %s\n", buf); }

        Output:

        $ ./802590.pl 0 mode = w $ ./802590.pl 1 # with explicit close mode = r+

        Not really sure why it says "r+" instead of "w+", but I suspect it's because the "+>" internally maps to the same mode as "+<", after having clobbered the file...

        Also, if you set PERLIO_DEBUG, you can see that the "w+" mode is being applied to the PerlIO layers of fd 1 only in case it is properly closed/opened:

        $ PERLIO_DEBUG=/dev/tty ./802517.pl 1 # with explicit close ... ./802517.pl:0 openn(perlio,'(Null)','Iw',1,0,0,(nil),0,(nil)) ./802517.pl:0 Layer 0 is unix ./802517.pl:0 Layer 0 is unix ./802517.pl:0 PerlIO_push f=0x6253c0 unix w 0x603b08 ./802517.pl:0 fd 1 refcnt=1 ./802517.pl:0 PerlIO_push f=0x6253c0 perlio Iw 0x603b08 ./802517.pl:0 Layer 1 is perlio ... ./802517.pl:15 openn(perlio,'','w+',-1,0,0,(nil),1,0x60a178) ./802517.pl:15 Layer 0 is unix ./802517.pl:15 Layer 0 is unix ./802517.pl:15 PerlIO_push f=0x6253c0 unix w+ 0x603b08 ./802517.pl:15 fd 1 refcnt=1 ./802517.pl:15 PerlIO_push f=0x6253c0 perlio w+ 0x603b08

        Otherwise, the "w+" is being applied to a different fd (here fd 8), and thus disappears together with the fd when it is closed (after the dup2):

        $ PERLIO_DEBUG=/dev/tty ./802517.pl 0 ... ./802517.pl:0 openn(perlio,'(Null)','Iw',1,0,0,(nil),0,(nil)) ./802517.pl:0 Layer 0 is unix ./802517.pl:0 Layer 0 is unix ./802517.pl:0 PerlIO_push f=0x6253c0 unix w 0x603b08 ./802517.pl:0 fd 1 refcnt=1 ./802517.pl:0 PerlIO_push f=0x6253c0 perlio Iw 0x603b08 ./802517.pl:0 Layer 1 is perlio ... ./802517.pl:15 openn(perlio,'','w+',-1,0,0,(nil),1,0x60a178) ./802517.pl:15 Layer 0 is unix ./802517.pl:15 Layer 0 is unix ./802517.pl:15 PerlIO_push f=0x6253e8 unix w+ 0x603b08 ./802517.pl:15 fd 8 refcnt=1 ./802517.pl:15 PerlIO_push f=0x6253e8 perlio w+ 0x603b08 ./802517.pl:15 fd 8 refcnt=0 ./802517.pl:15 PerlIO_pop f=0x6253e8 perlio ./802517.pl:15 PerlIO_pop f=0x6253e8 unix

        (irrelevant parts snippet)

Re^3: reading from a file after a seek isn't working for me
by ikegami (Patriarch) on Oct 21, 2009 at 21:09 UTC
    No, sorry

    (You asked if I could explain. I can't. I don't know why it behaves as it does.)

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://802529]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others cooling their heels in the Monastery: (4)
As of 2024-04-20 00:27 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found