Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

typeglobs and filehandles

by zzspectrez (Hermit)
on Dec 20, 2000 at 22:45 UTC ( [id://47629]=perlquestion: print w/replies, xml ) Need Help??

zzspectrez has asked for the wisdom of the Perl Monks concerning the following question:

I am looking for a little enlightenment about typeglobs. I am writing module that stores a filehandle in an object. The filehandle is actually a filehandle to a socket but lets pretend it is just a file. This can be done doings something like the following.

sub foo { my $self = shift; my $filename = shift; open FH, $filename or die "Can not open file: $!\n"; $self->{'fh'} = \*FH; }

This has worked fine. However, now I need to store multiple filehandles in the object and I want one subroutine to handle opening the files and returning the filehandle. Since a typeglob holds an entire symbol table entry, I know that trying to open all the files with FH and returning a typeglob will not work.

So I do what I allways do when Im not sure what will hapen, I make a test applications. This is what I get.

package FOO; use diagnostics; sub new { my $class = shift; my $self = { blahh => 'blah', }; my (%opts) = @_; return unless (exists $opts{'input'} && exists $opts{'output'}); %$self = ( %$self, %opts ); bless $self, $class; $self->{'inp_fh'} = $self->_open_read; $self->{'out_fh'} = $self->_open_write; return $self; } sub read_line { my $self = shift; my $fh = $self->{'inp_fh'}; return <$fh>; } sub write_line { my $self = shift; my $data = shift; my $fh = $self->{'out_fh'}; print $fh $data; } sub _open_read { my $self = shift; { local *FH; open FH, $self->{'input'} or die "Couldn't open: $!\n"; return \*FH; } } sub _open_write { my $self = shift; { local *FH; open FH, ">$self->{'output'}" or die "Couldn't open file: $!\n"; return \*FH; } } sub DESTROY { my $self = shift; close ($self->{'inp_fh'}); close ($self->{'out_fh'}); } package main; use strict; my $data; my $foo = FOO->new('input'=>"inp.dat", 'output'=>"out.dat") || die ("Error creating new FOO!!\n"); for my $key (keys %$foo) { print "KEY: $key \tVAL: $foo->{$key}\n"; } while ($data = $foo->read_line) { $foo->write_line($data); }

This works fine, making a duplicate of the file, and returning data like the following:

KEY: out_fh VAL: GLOB(0x1a75098) KEY: inp_fh VAL: GLOB(0x1a75038) KEY: input VAL: inp.dat KEY: blahh VAL: blah KEY: output VAL: out.dat

Making slight modifications. Changing _open_read() and _open_write() so that the return is return *FH. I run this and the program has the same desired results.. The output of the filehandle keys changes to:

KEY: out_fh VAL: *FOO::FH KEY: inp_fh VAL: *FOO::FH

In my mind, I did not think this would work. Since FH goes out of scope at the end of the block, you would think you would need a reference to the typeglob to maintain access to it. Secondly, in the second example the object appears to have two references to the same filehandle *FOO::FH. However although they are opened as the same filehandle, the code operates on two seperate filehandles. How is this working?

Lastly, I remember reading an article by Mark-Jason Dominus in Perl Journal on the uses of local. At the time I did not understand much about typeglobs or the diferences between local and my so I only glanced at the article. Looking back at the article I find an interesting tidbit that relates

The article sugests using the following construct to get a glob that is disconected from the symbol table: $fh = do { local *FH };. Then you use $fh as if it was a filehandle.

So I modify my test program like the following:

sub _open_write { my $self = shift; my $fh = do { local *FH }; open $fh, ">$self->{'output'}" or die "Couldn't open file: $!\n"; return $fh; }

This works great! However, once again Im a little confused about how this trick works. First step I open my camel book and refresh reading about do. Ok, I feel stupid, I didnt know that a do block return the value of the last expression evaluated. This brings a little enlightenment. However, once again Im confused on how the block can return the value of *FH which loses scope once the block exits.

I guess part of the confusion is, what exactly is being returned?!? According to the print of my hash, both variables point to the same glob *FOO::FH which was local in scope.

Thanks!
zzSPECTREz

Replies are listed 'Best First'.
Re: typeglobs and filehandles
by Dominus (Parson) on Dec 20, 2000 at 23:15 UTC
    Says zzspectrez:
    > Im confused on how the block can return the value
    > of *FH which loses scope once the block exits.
    I'm going to explain this in two ways. First, I'll show an analogy to something that you probably do all the time:
    sub foo { my $z = 4; return $z; }
    How can the block return the value of $z which loses scope once the block exits? But there's no problem here, and you probably never thought that there might be a problem. So why are you worried about returning the value of *FH in the same way?

    Second explanation: You have misunderstood the notion of scope. Values don't have scope. Variables don't have scope. Only names have scope.

    In the code { my $var; ... }, the name var is in scope inside the block. Outside the block, it is out of scope, and the name has no meaning.

    Does the variable lose its value just because the name has gone out of scope? Maybe, but maybe not. Consider this:

    my $x; { my $var = 4; $x = \$var; } # 'var' is now out of scope. print $$x; # but this still prints "4".

    Here we create a variable, named $var, and set it to 4. Then we take a reference to the variable and store the reference in $x. Then the name var goes out of scope.

    Is the variable destroyed? No. Why not? Because in Perl, a variable is never destroyed just because its name has gone out of scope. In Perl, a variable is destroyed when there are no references to it. The name is one reference. Usually it's the only one, and when the name goes away, there are no more references, and the variable is destroyed. In this case, there is a second reference, in $x, so the variable that was formerly known as $var will not be destroyed until the reference in $x is also gone. It's not known as $var any more (because that name is out of scope) but it's not destroyed either.

    Scope is the part of the program text in which the name is visible. It's independent of what happens when the program is run. You can tell just from looking at the source code what the scope of each name is. In Perl, the scope of a name begins on the statement following the declaration, and continues up to the end of the smallest enclosing block.

    Variables do not have scope. Instead, they have duration. The duration of a variable is the period of time in which memory is allocated for the variable. In Perl, a variable's duration continues until there are no more references to it, at which point it is destroyed.

    People often confuse variables with their names, but they are not the same thing. A variable is a part of the computer's memory, and it might or might not have a name. In a program, the same name might refer to different variables at different times.


    Now what is my $x = do { local *FH }; doing?

    I'm going to remove one of the confusions by rewriting it in an equivalent form:

    sub new_filehandle { local *FH; return *FH; } $x = new_filehandle();

    What does new_filehandle do? First, it removes the old value that *FH had, if any, and replaces the value of *FH with a new, fresh glob. It arranges for the old value of *FH to be restored at the end of the block.

    Then it returns the value of *FH, which as we just saw, is a new, fresh glob. That new, fresh glob is what is assigned to $x.

    Then it's the end of the block, so the old value of *FH is restored.

    Is the new, fresh glob destroyed? No. Why not? Its duration (not scope) is still going on, because it is referenced by $x.

    Now you probably know that return is optional on the last line of a subroutine; Perl will return the final value whether you say return or not. That's why you can write sub pi { 3.14159 }. So we can get rid of return:

    sub new_filehandle { local *FH; *FH; # 'return' is implied } $x = new_filehandle();

    It turns out that the value of the local is the new value of *FH, so it is not necessary to repeat *FH:

    sub new_filehandle { local *FH; # return of *FH is implied } $x = new_filehandle();

    Now we can replace the subroutine with a do block, which just executes the code in a block and returns the resulting value:

    $x = do { local *FH; }; # return of *FH is implied

    Hope this helps.

      Let me start off by saying thank you for such a well written response! You have clarified and verified my understanding of a lot of whats going on.

      Dominus says:
      > Variables do not have scope. Instead, they have duration.
      >The duration of a variable is the period of time in which
      >memory is allocated for the variable. In Perl, a variable's
      >duration continues until there are no more references to it, at
      >which point it is destroyed.

      This makes sense. Perl doesnt dispose of a variables value in memory untill there are no more references to it. This seems to be convenient. However.. I was wondering if this could cause problems in the situation of say a Linked List. Since the list would have references to other nodes in the list, you would think you would have to traverse the structure to release the memory. So of course I made another test:

      #!/usr/bin/perl -w sub add_node { my $tmp = { }; my $prev = shift; $tmp->{'prev'} = $prev; $tmp->{'data'} = shift; $prev->{'next'} = $tmp; return $tmp; } my $pos = { data => 0 }; { my $cur = $pos; for my $x (1..1000000) { $cur = add_node($cur, $x); } } while ($inp ne "quit") { print "DATA: $pos->{'data'}\n"; $inp = <STDIN>; chomp $inp; if ($inp eq '>') { $pos = $pos->{'next'} if exists ($pos->{'next'}); }elsif ($inp eq '<') { $pos = $pos->{'prev'} if exists ($pos->{'prev'}); } } sleep (10); print "\$pos = undef\n"; $pos = undef; sleep (10); $pos = "zzSPECTREz"; print "\$pos = $pos\n"; sleep (10); print "make new linked list with \$pos\n"; { my $cur = $pos; for my $x (1..100000) { $cur = add_node($cur, $x); } } sleep(10);

      Now I open Windows Task Manager to watch memory usage of perl with this script. As the script builds the list perls memory usage grows significantly and settles at 68,820k. Using the '<' and '>' keys I navigate the list verify it works. Then type quit. First the $pos variable is set to undef. No change in memory usage. Then it is set to the text 'zzSPECTREz' no memory change. Then we build a new smaller linked list. At this point the memory usage shrinks to about 12,400k. Hmmm.

      This leaves me confused on whats happening. I thought that a memory leak would be caused since I was destroying my reference to my linked list but since the list had references to itself perl would not release the memory. This would make sense. The test program seems to prove this when memory is not released when I change the value of my pointer to my list. However the memory does release when I build a new list using that variable.. Why?

      thank you for your explanations!
      zzSPECTREz

        Perl has a "leak" with circular references. I sort of boggled at your code for a while since it is rather, *ahem* pathological, but in the end I don't think you made a circular construct. You just made a really deep set of nested hashes. I think the whole thing can unwind from the top. The reference you destroy is the lynchpin and once it is pulled each remaining level can fall one by one.
        # These are bad, um-kay? my $a; $a = \$a; my ($b,$c); ($b, $c) = \($c, $b); my @d; $d[0]=0; $d[$_]=\$d[$_-1] for (1..100_000); $d[0]=\$d[$#d];

        --
        $you = new YOU;
        honk() if $you->love(perl)

      >Hope this helps.

      It helps alot!

      As an occasional perl programmer, i'd like to thank you for this reply. I was not aware of these subtleties.

      This post was so good I am moved to comment on it directly... ++ simply isn't enough.

      Wow. You rock.

      Gary Blackburn
      Trained Killer

      Sez Dominus:
      > Values don't have scope. Variables don't have scope.
      > Only names have scope.

      See, things like this are why I like to hang out here.

      Simple. Straightforward. Enlightening. It's a hat trick!

      Well done, and thanks in general. This (the whole post) should be in a tutorial.

      This is a really excellent explanation. Thank you very much. It can be tricky to keep track of names, scopes and durations while programming but it is certainly harder when you don't even fully understand what the various concepts represent.

      Maybe even worth putting somewhere like the tutorials section?

      I am one step closer to enlightenment for having read this.
Re (tilly) 1: typeglobs and filehandles
by tilly (Archbishop) on Dec 20, 2000 at 23:08 UTC
    What local really does is create a new thing and has it temporarily accessible under a global name. (OK, doesn't have to be global, you can hide hash values with local as well.) As always with Perl, that thing goes away when the last reference to it goes away. Frequently this happens when it is no longer accessible under the name of the global.

    So in the snippet a new typeglob is produced, made temporarily accessible under a new name, is unaliased and returned. It is no longer accessible under that name (though it remembers the name it was created as) but continues to exist because you can still get at it.

    Cute, huh? :-)

    One gotcha though. Suppose you have a hash of filehandles. Well you cannot do:

    while (<$fh_of{$whatever}>) { # ... }
    and have it work as expected since the semantics of Perl's parsing are ambiguous. You really need to (sorry) copy the filehandle into a variable and use that variable as the filehandle.
Re: typeglobs and filehandles
by Dominus (Parson) on Dec 20, 2000 at 22:48 UTC
    If you add
    local *FH;
    at the top of function foo, it will work.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://47629]
Approved by root
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chilling in the Monastery: (2)
As of 2024-04-20 03:04 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found