perltutorial
ikegami
<h3>Some Background</h3>
<p>As you probably know, Perl will successfully compile nested named subroutines, but they probably won't behave as intended. Perl even tries to warn you:
<c>
use strict;
use warnings;
sub outer {
my $foo;
sub inner {
$foo;
}
inner();
}
</c>
<c>
>perl -c test.pl
Variable "$foo" will not stay shared at test.pl line 8.
test.pl syntax OK
</c>
<p>The fix is obviously to use an anonymous sub instead of a named one.
<c>
use strict;
use warnings;
sub outer {
my $foo;
my $inner = sub {
$foo;
};
$inner->();
}
</c>
<c>
>perl -c test.pl
test.pl syntax OK
</c>
<p>However, that fails spectacularly when recursion is introduced.
<h3>Problem 1 — Referencing the Wrong Variable</h3>
<p>This problem is almost always easy to spot if <c>use strict;</c> is in effect.
<c>
use strict;
use warnings;
sub outer {
my $inner = sub {
$inner->()
}
}
</c>
<c>
Global symbol "$inner" requires explicit package name at test.pl line 6.
test.pl had compilation errors.
</c>
<p>Remember that a <c>my</c> only makes the declared symbol available in statements following the one containing the <c>my</c>. This can be fixed trivially by splitting the assignment into two statements.
<c>
use strict;
use warnings;
sub outer {
my $inner;
$inner = sub {
$inner->()
}
}
</c>
<c>
>perl -c test.pl
test.pl syntax OK
</c>
<h3>Problem 2 — Memory Leak</h3>
<p>There's a subtle lesson we should have learned from the first problem: If the sub references the lexical (by capturing it), and that same lexical references the sub, then it's a cyclic structure that cannot be freed by Perl's garbage collecting mechanism.
<c>
# ReleaseTracker.pm
# This "module" simply provides "o()", a
# func that creates a simple object whose
# sole purpose is to print "Destroyed" when
# it is freed. The inner workings are not
# relevant to this example.
use strict;
use warnings;
package ReleaseTracker;
BEGIN {
our @EXPORT_OK = qw( o );
require Exporter;
*import = \&Exporter::import;
}
sub new { return bless(\(my $o), $_[0]); }
sub DESTROY { print("Destroyed\n"); }
sub o { ReleaseTracker->new(); }
1;
</c>
<c>
use strict;
use warnings;
use ReleaseTracker qw( o );
sub outer {
# Variables that are read by each recursive instance.
my $var = shift @_;
my $helper;
$helper = sub {
no warnings 'void';
"do something with $var and @_";
$helper->(@_);
};
#$helper->(@_);
}
outer( o() );
END { print("On program exit:\n"); }
</c>
<c>
>perl test.pl
On program exit:
Destroyed <--- BAD!
</c>
<p>$var is not being freed because the anonymous sub is not being freed.
<br>The anonymous sub is not being freed because it both references and is referenced by $helper.
<p>Let's illustrate:
<pre>################################################
## Before outer exits
##
&outer
| +=============+
v --+----->[ Reference ]
+==============+ / | +=============+
[ outer's pad ] / | [ refcount: 2 ] +=============+
+==============+ / | [ pointer: ------>[ Object ]
[ $var: ------- | +=============+ +=============+
[ $helper: ------- | [ refcount: 1 ]
+==============+ \ | +=============+
\ |
\ | +=============+
-----+-->[ Reference ]
| | +=============+
| | [ refcount: 2 ] +=============+
+==============+ | | [ pointer: ------>[ Helper Sub ]
[ helper's pad ] | | +=============+ +=============+
+==============+ | | [ refcount: 1 ]
[ $var: ------------+ | [ pad: -----+
[ $helper: ---------------+ +=============+ |
+==============+ |
^ |
| |
+-----------------------------------------------------------+
################################################
## After outer exits
##
+=============+
+----->[ Reference ]
| +=============+
(outer still | [ refcount: 1 ] +=============+
exists, but | [ pointer: ------>[ Object ]
it's not | +=============+ +=============+
referencing | [ refcount: 1 ]
anything in | +=============+
this graph ) |
| +=============+
| +-->[ Reference ]
| | +=============+
| | [ refcount: 1 ] +=============+
+==============+ | | [ pointer: ------>[ Helper Sub ]
[ helper's pad ] | | +=============+ +=============+
+==============+ | | [ refcount: 1 ]
[ $var: ------------+ | [ pad: -----+
[ $helper: ---------------+ +=============+ |
+==============+ |
^ |
| |
+-----------------------------------------------------------+
</pre>
<p>Nothing has a refcount of zero, so nothing can be freed.
<h3>Solution — Dynamic Scoping</h3>
<p>The solution to both problems is the same: Don't use a lexical variable.
<c>
use strict;
use warnings;
use ReleaseTracker qw( o );
sub outer {
# Variables that are read by each recursive instance.
my $var = shift @_;
local *helper = sub {
no warnings 'void';
"do something with $var and @_";
helper(@_);
};
#helper(@_);
}
outer( o() );
END { print("On program exit:\n"); }
</c>
<c>
>perl test.pl
Destroyed
On program exit:
<-- good
</c>
<p>Package variables aren't captured, so &helper's reference count isn't affected by the call in the inner function.
<pre>
################################################
## Before outer exits
##
&outer
|
v
+=================+
[ outer's pad ]
+=================+ +=============+
[ $var: --------------+--->[ Reference ]
+=================+ | +=============+
| [ refcount: 2 ] +=============+
| [ pointer: ------>[ Object ]
| +=============+ +=============+
+=================+ | [ refcount: 1 ]
[ *helper{SCALAR} ] | +=============+
+=================+ | +=============+
[ pointer: --------------->[ Reference ]
+=================+ | +=============+
| [ refcount: 1 ] +=============+
| [ pointer: ------>[ Helper Sub ]
+=================+ | +=============+ +=============+
[ helper's pad ] | [ refcount: 1 ]
+=================+ | [ pad: -----+
[ $var: ===-----------+ +=============+ |
+=================+ |
^ |
| |
+--------------------------------------------------------+
################################################
## After outer exits
## and local restores
## *helper{SCALAR}
##
+=============+
+--->[ Reference ]
| +=============+
(outer still | [ refcount: 1 ] +=============+
exists, but | [ pointer: ------>[ Object ]
it's not | +=============+ +=============+
referencing | [ refcount: 1 ]
anything in | +=============+
this graph ) |
| +=============+
| [ Reference ]
| +=============+
| [ refcount: 0 ] +=============+
+=================+ | [ pointer: ------>[ Sub ]
[ helper's pad ] | +=============+ +=============+
+=================+ | [ refcount: 1 ]
[ $var: --------------+ [ pad: -----+
+=================+ +=============+ |
^ |
| |
+--------------------------------------------------------+
</pre>
<p>There is no cycle, so everything will be freed in turn, starting with the reference with a refcount of zero.
<h3>Alternative Solutions</h3>
<p>I won't delve into these, so feel free to provide links which discuss these in more details. For each alternative solution, I'll post the equivalent to the solution I've already presented.
<c>
sub outer {
...
local *helper = sub {
...
helper(...);
...
};
helper(@_);
}
</c>
<h4>Y Combinator</h4>
<p>[ambrus] pointed out that [http://use.perl.org/~Aristotle/journal/30896|Y Combinator] can also achieve this goal. This is the most worthwhile alternative.
<c>
sub outer {
...
Y( sub {
my $helper = shift @_;
sub {
...
$helper->(...);
...
}
} )->(@_);
}
</c>
<p>While it's a great tool to completely anonymize a recursive function, and while it might even be required in functional languages, it adds overhead in a situation where reducing overhead is important, and I think it's unnecessarily complex to solve the problem of nesting functions in Perl.
<h4>Weak Reference</h4>
<p>One could weaken the reference to the inner function using [mod://Scalar::Util|<c>Scalar::Util</c>]<c>::weaken</c>.
<c>
sub outer {
...
my $helper;
$helper = sub {
...
$helper->(...);
...
};
weaken($helper);
$helper->(@_);
}
</c>
<p>However, the Weak Reference solution is much more complex than the Dynamic Scoping solution and has no advantage that I can see.
<h3>Summary</h3>
<p>Using <c>local *helper = sub {};</c> over <c>my $helper = sub {};</c> not only provides a cleaner calling syntax, it can be used for recursive functions without accidentally referencing the wrong variable or causing a memory leak.