perltutorial
arturo
<h1>Scoping</h1>
<p>One thing you need to know to master Perl is how to deal with the scoping
mechanisms it provides you with. You want globals? You got 'em! You want to avoid "collisions" (two variables with the same name clobbering each other)? You got it, and there's more than one way to manage the trick. But Perl's scoping rules aren't always so well understood, and it's not just the difference between <code>my</code> and <code>local</code> that trips people up, although clearing that up is going to be one of my purposes.
<p>I've learned a lot from [http://perl.plover.com/FAQs/Namespaces.html|Coping
with Scoping] and sections in various Perl books ( e.g.[Effective Perl Programming] ). So credit has to go to those authors
([Dominus] for the first, and Joseph N. Hall and [merlyn] for the second. [Dominus] also provided some helpful corrections to errors (some egregious) to an earlier version of this tutorial, so he should get at least second-author credit. However, the documentation that comes with your local perl installation is always the most up-to-date you can get, so don't be afraid to use <code>perldoc perlop</code> and perldoc -f foo</code>! on your own system.</p>
<READMORE>
<h3>Summary</h3>
<p>Yes, at the beginning ...
<ul>
<li><code>my</code> provides [lexical scoping]; a variable declared with <code>my</code> is visible only within the block in which it is declared.
<li>Blocks of code are hunks within curly braces <code>{}</code>; files are blocks.
<li>Use <code>use vars qw([list of var names])</code> or <code>our ([var_names])</code> to create package globals.
<li><code>local</code> saves away the value of a package global and substitutes a new value for all code within <i>and</i> called from the block in which the <code>local</code> declaration is made.
</ul>
<h2>Namespaces</h2>
<p>A basic idea, although one you need not master to write many scripts, is the
notion of a <i>namespace</i>. Global variables (variables not declared with <code>my</code> live in a package. A package
provides a <i>namespace</i>, which I'm going to explain by reference to the
metaphor of a family name. In English speaking countries, "Robert" is a
reasonably common name, so you (assuming you live in one) probably know more
than one "Robert." Usually, for us humans, the current conversational context is enough to determine for our audience which Robert we're talking about (my chums down at the pool hall know Robert the darts genius, but at work, "Robert" is the CEO of our failing dot-com).
<p>Of course these people have <i>family</i> names too (yes, those can be shared by different people as well -- but you can't expect this metaphor to be <i>perfect</i> =), and if we wanted to be fully explicit we'd add that to allow our audience to determine <i>which</i> Robert we are talking about. <code>$Smith::Robert</code> is a creature distinct from
<code>$Jones::Robert</code>. When you have two different variables with the
same (as it were) 'first name', you can explicitly declare which one you want to
refer to, no matter where you are in your code, by using the full name of the variable.
<p>Use the <code>package</code> operator to set the current package.
When you put <code>package Smith</code> in your code, you are, in
effect, saying that every unqualified variable or function name should be
understood to belong to the <code>Smith</code> package. To go with our
metaphor, you're saying "in this bit of code, I want to talk about the
<code>Smith</code> family."
<p>Implicitly, there's a <code>package main;</code> at the top of your scripts;
that is, unless you explicitly declare a different package, all the variables
you declare (keeping the caveat about <code>my</code> in mind) will be in
<code>main</code>. Variables that live in a package are reasonably called
"package globals", because they are accessible by default to every operator and subroutine that lives in the same package (and, if you're explicit about their names, outside the package, too).
<p>Using packages makes accessing Perl variables sort of like travelling in different circles. For example, at work, it's
understood that "Robert" is "Robert Szywiecki", the boss. At the pool hall,
it's understood that "Robert" is "Robert Yamauchi", the darts expert. Here's a
little code to illustrate the use of packages:</p>
<code>
#!/usr/bin/perl -w
package Szywiecki;
$Robert = "the boss";
sub terminate {
my $name = shift;
# the following line was updated on 2004-12-29 following on aristotle73's comment
print "$Robert has canned ${name}'s sorry butt\n";
}
terminate("arturo"); # prints "the boss has canned arturo's sorry butt"
package main;
# terminate("arturo"); # produces an error if you uncomment it
</code>
<p>The variable <code>$Robert</code>'s full name, as it
were, is <code>$Szywiecki::Robert</code> (note how the <code>$</code>
moves out to the front of the package name, indicating that
this is the scalar <code>Robert</code> that lives in package <code>Szywiecki</code>). To code and, most importantly, subroutines in the <code>Szywiecki</code> package, an unqualified <code>$Robert</code> refers to <code>$Szywiecki::Robert</code> -- <i>unless</i> <code>$Robert</code> has been 'masked' by <code>my</code> or <code>local</code> (more on that later).
<p>Now, if you <code>use strict</code> (and you should, you should, you should -- see [strict.pm], for example), you'll need to declare those global variables before you can use
them, UNLESS you want to fully qualify them. That's why the second (apparent)
call to <code>terminate</code> in the above example will fail. It's expecting
to find a subroutine <code>terminate</code> in the <code>main</code> package,
but no such critter has been defined. That is, </p>
<code>
#!/usr/bin/perl -w
use strict;
$Robert = "the boss"; # error!
print "\$Robert = $Robert\n";
</code>
<p>will produce an error, whereas if we fully qualified the name (remember that
implicit <code>package main</code> in there), there's no problem:</p>
<code>
#!/usr/bin/perl -w
use strict;
$main::Robert = "the boss";
print "\$main::Robert = $main::Robert\n";
</code>
<p>To satisfy <code>strict 'vars'</code> (the part of <code>strict</code> that
enforces variable declaration), you have two options; they
produce different results, and one is only available in perl 5.6.0 and
later:</p>
<ol>
<li><code>our ($foo, $bar)</code> operator (in perl 5.6.0 and above)
declares <code>$foo</code> to be a variable in the current package.
<li><code>use vars qw($foo $bar)</code> (previous versions, but still works in
5.6) tells 'strict vars' that these variables are OK to use
without qualification in the current package.
</ol>
<p>One difference between <code>our</code> and and the 'older' <code>use vars</code> is that <code>our</code> provides <i>lexical scoping</i> (more on which in the section on <code>my</code> below).
<p>Another difference is that with <code>use vars</code>, you are expected to
give an array of variable <i>names</i>, not the variables themselves (as with
<code>our</code>). Both mechanisms allow you to use globals while still maintaining one of the chief benefits of <code>strict 'vars'</code>: you are protected from accidently generating a new variable via a typo. <code>strict 'vars'</code> demands that your variables be explicitly declared (as in "here's a list of my package globals"). Both of these mechanisms allow you to do this with package globals.</p>
<p>A thing to remember about packages (and potentially a bad thing, depending on how big a fan you are of "privacy") is that package globals aren't
just global to that package, but they can be accessed from <i>anywhere in your
code</i>, as long as the names are fully qualified. You can talk about
Robert the darts expert at work, if you say "Robert Yamauchi" (warning: I didn't use strict here, but it's only for purposes of brevity!):</p>
<code>
#!/usr/bin/perl -w
package Szyewicki;
$Robert = "the boss";
package PoolHall;
$Robert = "the darts expert";
package Sywiecki; # back to work!
print "Here at work, 'Robert' is $Robert, but over at the pool hall, 'Robert'
is $PoolHall::Robert\n";
</code>
<p>See? Understanding packages isn't really all that hard. Generally, a
package is like a family of variables (and subroutines! the full name of that
<code>terminate</code> in the example above is
<code>&Szywiecki::terminate</code> -- similar remarks apply to hashes and
arrays, of course).
<h2><code>my</code> (and a little more on <code>our</code>) <i>a.k.a.</i> lexical scoping</h2>
<p>Variables declared with <code>my</code> are not globals, although they can
act sort of like them. A main use of <code>my</code> is to operate on a variable that's only of use within a loop or subroutine, but that's by no means where it ends. Here are some basic points about <code>my</code>
<ul>
<li>A <code>my</code> variable has a <i>block</i> of code as its scope (i.e. the places in which it is accessible).
<li>A block is often declared with braces <code>{}</code>, but as far as Perl is concerned, a file is a block.
<li>A variable declared with <code>my</code> <i><b>does not
belong to any package</b></i>, it 'belongs' only to its block
<li>Although you can name blocks (e.g. <code>BEGIN</code>, with which you may already be familiar), you can't fully qualify the name of the block to get
to the <code>my</code> variable.
<li>File-level <code>my</code> variables are those which are declared in a file outside of any block within that file.
<li>You can't access a file-level <code>my</code> variable from outside of the file in which it is declared (<b>unless</b> you explicitly return it from a subroutine, for example).
</ul>
<p>As long as you're writing one-file scripts (e.g. ones that don't import modules), some of these points don't matter a great deal. But if you're heavily
into "privacy" and "encapsulation", and if you write modules and OO modules you
will be, you'll need to understand all of the above.</p>
<p>Here's some commented code to explain some of these points:
<code>
#!/usr/bin/perl -w
use strict;
#remember we're in package main
use vars qw($foo);
$foo = "Yo!"; # sets $main::foo
print "\$foo: $foo\n"; # prints "Yo!"
my $foo = "Hey!"; # this is a file-level my variable!
print "\$foo: $foo\n"; # prints "Hey!" -- new declaration 'masks' the old one
{ # start a block
my $foo = "Yacht-Z";
print "\$foo: $foo\n";
# prints "Yacht-Z" -- we have a new $foo in scope.
print "\$main::foo: $main::foo\n";
# we can still 'see' $main::foo
subroutine();
} # end that block
print "\$foo: $foo\n"; # there it is, our file-level $foo is visible again!
print "\$main::foo: $main::foo\n"; # whew! $main::foo is still there!
sub subroutine {
print "\$foo: $foo\n"; # prints "Hey!" -- as the script is written
# why? Because the variable declared in the naked block
# is no longer in scope -- we have a new set of braces.
# but the file-level variable is still in scope, and
# still 'masks' the declaration of $main::foo
}
package Bar;
print "\$foo: $foo\n"; # prints "Hey!" -- the my variable's still in scope
# if we hadn't made that declaration above, this would be an error: the
# interpreter would tell us that Bar::foo has not been defined.
</code>
<p>As the bottom bit in the above example shows, because they don't live in any
package, <code>my</code> variables <i>can be</i> visible even though a new package has been declared <i>because the block is the file</i> (at least for these purposes)
<p>Now the example above used a 'naked' block -- there's no control structure
(e.g. <code>if</code> or <code>while</code>) involved. But of course that
makes no difference to the scoping.
<p>File-level <code>my</code> variables <i>ARE</i> accessible from within
blocks defined within that file (as the example above shows) this is one way in
which they're sort of like globals. If, however, <code>subroutine</code> had
been defined in a different file, we would have a run-time error. Once you
know how <code>my</code> works, you can see, just by looking at the syntax of
the file, where a <code>my</code> variable is going to be accessible. This is
one reason the scoping it provides is called "lexical scoping." Here's a place
where <code>use vars</code> and the 'new' <code>our</code> operator differ: if
you specify <code>our $foo</code> in package <code>Bar</code> but <i>outside of an explicit block</i>, you're in effect
saying that (until some other scoping operator comes into play) occurrences of
<code>$foo</code> are to be understood as referring to
<code>$Bar::foo</code>. This should illustrate the difference between <code>use vars</code> and the newer <code>our</code>:
<code>
#~/usr/bin/perl -w
use strict;
our ($bob);
use vars qw($carol);
$carol = "ted";
$bob = "alice";
print "Bob => $bob, Carol => $carol\n";
package Movie;
print "Bob => $bob, Carol => $carol\n";
</code>
<p>Note that having the second <code>print</code> will produce an error, because <code>$carol</code> is interpreted as <code>$Movie::carol</code>, while <code>$bob</code> is interpreted as <code>$main::bob</code>.
<p>While this "package spanning" (which is only apparent in the case of <code>our</code>!) is a partial functional similarity between the two different kinds of lexical scoping operators, don't
forget the difference, which is that <code>our</code> declares a package
global, while <code>my</code> does not.</p>
<h2><code>local</code> -- <i>a.k.a.</i> dynamic scoping</h2>
<p>Now we arrive at <code>local</code>, which is only sort of like
<code>my</code>, but due to its name, its function is sometimes confused with
that of <code>my</code>. Here's the skinny : <code>local $foo</code> <i>saves
away</i> the current value of the (package) <b>global</b> <code>$foo</code>, and
determines that in the current block <i>and</i> any code called by the current block,
<code>$foo</code> refers to whatever value you give it in that block (a bare
<code>local $foo</code> will set $foo to <code>undef</code>;
the same goes for <code>my</code>). As things now stand, <code>local</code>
only works on <b>globals</b>, you can't use it on a <code>my</code> variable.
<p>Since <code>local</code> can affect what happens outside of the block in which it's used, <code>local</code> provides what's called <i>dynamic</i>
scoping, as its effect is determined by what happens when the script is run. That is, the compiler can't tell when <code>local</code> is going to have its effect or not at the time it's compiling the script (which happens before the script is run). This distinguishes dynamic scoping from the lexical scoping provided by <code>my</code> and <code>our</code>, the effects of which can be checked at compile time.
<p>The basic upshot of this difference is that if you <code>local</code>ize a variable within a block and call a subroutine from that block, that subroutine will see the value of the <code>local</code>ized variable. This is a major difference between <code>my</code> and <code>local</code>. Compare the above example to this one:
<code>
#!/usr/bin/perl -w
use strict;
use vars qw ($foo); # or "our $foo" if you're using 5.6
$foo = "global value";
print "\$foo: $foo\n"; # prints "global value"
print "mysub result '", &mysub(), "'\n"; #"global value"
print "localsub result '", &localsub(), "'\n"; # "local value"
print "no sub result '",&showfoo(), "'\n"; #"global value"
sub mysub {
my $foo = "my value";
showfoo(); #
}
sub localsub {
local $foo = "local value";
showfoo(); # ALWAYS prints "local value"
}
sub showfoo {
return $foo;
}
</code>
<font size=-2>original example modified as per [Masem]'s note. Thanks!</font>
<p>Notice that the <code>my</code> declaration in <code>mysub</code> gets
(apparently) ignored by showfoo (since we've left the block in which the <code>my</code> declaration is valid, but the <code>local</code> declaration in <code>localsub</code> doesn't get ignored. But after we've left that block, the original value of <code>$foo</code> is visible again.
<p>I hope you learn as much from reading this as I did from writing it!