Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

Re^3: reading from a file after a seek isn't working for me

by almut (Canon)
on Oct 21, 2009 at 21:44 UTC ( [id://802544]=note: print w/replies, xml ) Need Help??


in reply to Re^2: reading from a file after a seek isn't working for me
in thread reading from a file after a seek isn't working for me

can you please explain in detail the need for the explicit close here?

I think it has to do with PerlIO in combination with an implementation peculiarity.

When you compare the straces of both variants, you'll see something like:

# with explicit close close(1) = 0 open("/tmp/stdout.log", O_RDWR|O_CREAT|O_TRUNC, 0666) = 1 ioctl(1, SNDCTL_TMR_TIMEBASE or TCGETS, 0x7fff2ba69a30) = -1 ENOTTY (I +nappropriate ioctl for device) lseek(1, 0, SEEK_CUR) = 0 fstat(1, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 fcntl(1, F_SETFD, 0) = 0 # without explicit close open("/tmp/stdout.log", O_RDWR|O_CREAT|O_TRUNC, 0666) = 4 ioctl(4, SNDCTL_TMR_TIMEBASE or TCGETS, 0x7fff023c1390) = -1 ENOTTY (I +nappropriate ioctl for device) lseek(4, 0, SEEK_CUR) = 0 fstat(4, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 dup2(4, 1) = 1 close(4) = 0 fcntl(1, F_SETFD, 0) = 0

Now, the issue is (I think) that although the dup2 does create a copy of fd 4 as fd 1 at the system level (and in fact does also close the old fd 1), it does not copy the PerlIO part, which is only being handled properly, when the filehandle is being created directly using Perl's open.  For this reason, the filedescriptor is considered invalid from the PerlIO point of view (—> the "Bad file descriptor" message). This is checked at the beginning of Perl's read using PerlIOValid(f)1 (even before doing any read system call).

Don't ask (me), however, why the indirect dup2-technique is being used in the first place instead of simply closing the filedescriptor before the open...  (Presumably, it did work before the the introduction of PerlIO, and might just not have been adapted appropriately since.)

___

1  see perlio.c:

#define Perl_PerlIO_or_Base(f, callback, base, failure, args) \ if (PerlIOValid(f)) { \ const PerlIO_funcs * const tab = PerlIOBase(f)->tab;\ if (tab && tab->callback) \ return (*tab->callback) args; \ else \ return PerlIOBase_ ## base args; \ } \ else \ SETERRNO(EBADF, SS_IVCHAN); \ return failure ... SSize_t Perl_PerlIO_read(pTHX_ PerlIO *f, void *vbuf, Size_t count) { Perl_PerlIO_or_Base(f, Read, read, -1, (aTHX_ f, vbuf, count)); }

Replies are listed 'Best First'.
Re^4: reading from a file after a seek isn't working for me
by jakobi (Pilgrim) on Oct 21, 2009 at 22:46 UTC

    Thanx for the pointer, almut. That dup & perlio scrap is interesting.

    But there must be more to it than that, as I don't see any special treatment for STDOUT in the perlio.c scrap (neither for the numeric FD's 0 to 2):

    I was playing with the scrap below in the meantime.

    I dupped SAVOUT on STDERR instead / simplifying system to printing / using autoflush / opening STDOUT myself to /dev/tty first: no change.

    This however is interesting:

    Changing the name of the handle STDOUT <=> ANYTHINGeLSE manages to act as a toggle for the problem. Furthermore, w/o close, the tell on the STDOUT file pointer at begin prints 19 in the example below (might be due to the handle earlier being a tty, and something didn't quite catch the change to a plain file w/o explicit close?). Any other handle name prints 5 regardless of close or no close.

    So it looks like we have some hard-coded STDOUT-related magic somewhere in the guts of PERLIO or even lower, with probably STDIN/ERR offering similar peculiarities.

    Given that too much in Perl, esp wrt <> and stdio is magic, it's probably a good idea to say strictly outside any possibly dusty corner whose smell is faintly related to something magic. Which in this case might just be the idea of reusing a special handle, and worse, reading from it.

    How to classify this behaviour: What doc/code do we still miss? Or is this indeed, say, an easy-to-fix oversight in the documentation? Or is it a somewhat larger actual bug?

    Still wondering (& vowing to step even more cautiously anywhere near STDIO magic),
    My thanx to almut & ikegami for the work below!
    less confused now (& busy scribbling away two new-to-me debugging tips along with a link to their demonstration here)
    Peter

      ...as I don't see any special treatment for STDOUT in the perlio.c scrap

      Just to be clear: the perlio.c snippet was only meant to show where the PerlIOValid() check happens for the read.  The decision between using a direct close vs. the indirect dup2, OTOH, is more likely to happen in Perl's open implementation (which I didn't yet have time to wade through — it's rather lengthy...  and for a low-depth explanation I figured the manifestation of the difference in the strace should be sufficient evidence).

Re^4: reading from a file after a seek isn't working for me
by ikegami (Patriarch) on Oct 22, 2009 at 01:57 UTC

    For this reason, the filedescriptor is considered invalid from the PerlIO point of view

    But it's not, or at least not completely invalid. You can still seek using the handle and print to the handle without problem. For example, adding

    seek(STDOUT, -0, 2) or die $!; print STDOUT "abc\n";

    does indeed append "abc\n" to the file.

    It's more like Perl remembers the handle's original mode and doesn't realize it can read from it now.

    Update: I did a bit of Dumping and stracing of my own.

    There's is no difference in the IO objects. I'm now with you leaning towards a PerlIO problem.

    Seems that the "Bad file descriptor" message originates from Perl, not the system. Perl doesn't even attempt to read from STDOUT.

    $ cat a.pl use Devel::Peek; open(SAVOUT, '>&STDOUT') or die $!; close(STDOUT) if $ARGV[0]; open(STDOUT, '+>', "/tmp/stdout.log") or die $!; Dump(*STDOUT{IO}); @argv = qw(/bin/echo hello world); system(@argv); print SAVOUT "before=", tell(STDOUT), "\n"; seek(STDOUT, 0, 0) or die $!; print SAVOUT "after=", tell(STDOUT), "\n"; while (1) { my $rv = read STDOUT, $_, 8192; die $! if !defined($rv); last unless $_; print SAVOUT "stdout=", $_; } print SAVOUT "at end=", tell(STDOUT), "\n"; close STDOUT; $ diff -u <(strace perl a.pl 0 2>&1) <(strace perl a.pl 1 2>&1) | less ... lseek(1, 0, SEEK_SET) = 0 lseek(1, 0, SEEK_CUR) = 0 -[ code to read locale-dependent version of error message] -write(2, "Bad file descriptor at a.pl line"..., 37Bad file descriptor + at a.pl line 17. -) = 37 +read(1, "hello world\n", 4096) = 12 +read(1, "", 4096) = 0 +close(1) = 0 -write(3, "before=0\nafter=0\n", 17before=0 +write(3, "before=0\nafter=0\nstdout=hello wo"..., 46before=0 after=0 -) = 17 +stdout=hello world +at end=12 +) = 46 close(3) = 0 -exit_group(9) = ? -Process 4028 detached +exit_group(0) = ? +Process 4032 detached
      But it's not, or at least not completely invalid. ...

      Good point.  Actually, when taking a closer look, I think Perl sets EBADF one routine further down in PerlIOBase_read() (which is being called from the macro Perl_PerlIO_or_Base),  in case the PERLIO_F_CANREAD flag isn't set:

      PerlIOBase_read(pTHX_ PerlIO *f, void *vbuf, Size_t count) { STDCHAR *buf = (STDCHAR *) vbuf; if (f) { if (!(PerlIOBase(f)->flags & PERLIO_F_CANREAD)) { PerlIOBase(f)->flags |= PERLIO_F_ERROR; SETERRNO(EBADF, SS_IVCHAN); return 0; } ...

      It's more like Perl remembers the handle's original mode and doesn't realize it can read from it now.

      Yes, and that's most likely because the dup2 doesn't copy the perl-internal PERLIO* flags (well, how should it, it knows nothing about them).

      The following snippet shows that the two STDOUTs modes differ depending on whether STDOUT is explicitly being closed first:

      (I made use of Inline::C because I couldn't find a way to call PerlIO_modestr() directly via plain Perl)

      #!/usr/bin/perl use Inline C; close STDOUT if $ARGV[0]; open(STDOUT, '+>', "/tmp/stdout.log") or die $!; dumpmode(STDOUT); __END__ __C__ void dumpmode(SV* fh) { char buf[10]; PerlIO *f = IoIFP(sv_2io(fh)); PerlIO_modestr(f, buf); fprintf(stderr, "mode = %s\n", buf); }

      Output:

      $ ./802590.pl 0 mode = w $ ./802590.pl 1 # with explicit close mode = r+

      Not really sure why it says "r+" instead of "w+", but I suspect it's because the "+>" internally maps to the same mode as "+<", after having clobbered the file...

      Also, if you set PERLIO_DEBUG, you can see that the "w+" mode is being applied to the PerlIO layers of fd 1 only in case it is properly closed/opened:

      $ PERLIO_DEBUG=/dev/tty ./802517.pl 1 # with explicit close ... ./802517.pl:0 openn(perlio,'(Null)','Iw',1,0,0,(nil),0,(nil)) ./802517.pl:0 Layer 0 is unix ./802517.pl:0 Layer 0 is unix ./802517.pl:0 PerlIO_push f=0x6253c0 unix w 0x603b08 ./802517.pl:0 fd 1 refcnt=1 ./802517.pl:0 PerlIO_push f=0x6253c0 perlio Iw 0x603b08 ./802517.pl:0 Layer 1 is perlio ... ./802517.pl:15 openn(perlio,'','w+',-1,0,0,(nil),1,0x60a178) ./802517.pl:15 Layer 0 is unix ./802517.pl:15 Layer 0 is unix ./802517.pl:15 PerlIO_push f=0x6253c0 unix w+ 0x603b08 ./802517.pl:15 fd 1 refcnt=1 ./802517.pl:15 PerlIO_push f=0x6253c0 perlio w+ 0x603b08

      Otherwise, the "w+" is being applied to a different fd (here fd 8), and thus disappears together with the fd when it is closed (after the dup2):

      $ PERLIO_DEBUG=/dev/tty ./802517.pl 0 ... ./802517.pl:0 openn(perlio,'(Null)','Iw',1,0,0,(nil),0,(nil)) ./802517.pl:0 Layer 0 is unix ./802517.pl:0 Layer 0 is unix ./802517.pl:0 PerlIO_push f=0x6253c0 unix w 0x603b08 ./802517.pl:0 fd 1 refcnt=1 ./802517.pl:0 PerlIO_push f=0x6253c0 perlio Iw 0x603b08 ./802517.pl:0 Layer 1 is perlio ... ./802517.pl:15 openn(perlio,'','w+',-1,0,0,(nil),1,0x60a178) ./802517.pl:15 Layer 0 is unix ./802517.pl:15 Layer 0 is unix ./802517.pl:15 PerlIO_push f=0x6253e8 unix w+ 0x603b08 ./802517.pl:15 fd 8 refcnt=1 ./802517.pl:15 PerlIO_push f=0x6253e8 perlio w+ 0x603b08 ./802517.pl:15 fd 8 refcnt=0 ./802517.pl:15 PerlIO_pop f=0x6253e8 perlio ./802517.pl:15 PerlIO_pop f=0x6253e8 unix

      (irrelevant parts snippet)

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://802544]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others imbibing at the Monastery: (4)
As of 2024-04-19 17:33 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found