http://qs321.pair.com?node_id=319780

djantzen has asked for the wisdom of the Perl Monks concerning the following question:

I've run into a problem with perl's reference count-based garbage collection that I cannot figure out. What I've got is a directed graph that fails to be destroyed prior to the global destruction phase that occurs when the interpreter exits. The problem arises (AFAICT) only when doing a depth first search through the graph. Since this is the backend to a mod_perl-based CGI, clearly I need timely destruction of the objects and cannot rely on the interpreter's exit. At this point I think it's something due to the way perl handles lexical pads in recursive method calls, but I don't get why. I've distilled the problem down by borrowing TheDamian's graph code from Object Oriented Perl pgs 108-14 and adding my DFS routine and some debugging stuff. The code should run as is, although the full version requires Data::Structure::Util. My bleeding brain thanks you.

use strict; use warnings; # Uncomment if you have this. #use Data::Structure::Util(qw/has_circular_ref circular_off/); # BEGIN NETWORK CLASS package Network; sub new { my ($class) = @_; bless { _nodes => [] }, $class; } sub node { my ($self, $index) = @_; return $self->{_nodes}[$index]; } sub add_node { my ($self) = @_; push @{$self->{_nodes}}, Node->new(); } # CAUSE OF THE TROUBLE sub DFS { my ($self, $node, $sub) = @_; my ($explored, $do_search); $do_search = sub { my ($node) = @_; $sub->($node); $explored->{$node->{_id}}++; foreach my $link (@{$node->{_outlinks}}) { $do_search->($link->{_to}) unless ($explored->{$link->{_id}}); } }; $do_search->($node); } sub transitive_closure_DFS { my ($self, $node) = @_; my $nodes = []; my $search = sub { push @$nodes, $_[0] }; $self->DFS($node, $search); return $nodes; } sub DESTROY { my ($self) = @_; print "DESTROYING $self\n"; foreach my $node (@{$self->{_nodes}}) { $node->delete_links(); } } # BEGIN NODE CLASS package Node; { my $_nodecount = 0; sub _nextID { return ++$_nodecount } } sub new { my ($class) = @_; bless { _id => _nextID(), _outlinks => [] }, $class; } sub add_link_to { my ($self, $target) = @_; push @{$self->{_outlinks}}, Link->new($target); } sub delete_links { my ($self) = @_; delete $self->{_outlinks}; } sub DESTROY { my ($self) = @_; print "DESTROYING $self $self->{_id}\n"; } # BEGIN LINK CLASS package Link; { my $_linkcount = 0; sub _nextID { return ++$_linkcount } } sub new { my ($class, $target) = @_; bless { _id => _nextID(), _to => $target }, $class; } sub delete_node { my ($self) = @_; delete $self->{_to}; } sub DESTROY { my ($self) = @_; print "DESTROYING $self $self->{_id}\n"; $self->delete_node(); # EVEN THIS DOESN'T KILL THE REMAINING NODES } package main; sub build_graph { my $network = Network->new(); for (0..2) { $network->add_node(); } $network->node(0)->add_link_to($network->node(1)); $network->node(0)->add_link_to($network->node(2)); $network->node(1)->add_link_to($network->node(2)); $network->node(2)->add_link_to($network->node(1)); my $neighbors = $network->transitive_closure_DFS($network->node(0) +); print "Neighbors\n"; print " $_ ID $_->{_id}\n" for (@$neighbors); # Uncomment if you have the module # circular_off($network); # THIS DOES NOT AFFECT BEHAVIOR, WHY? } print "BUILDING GRAPH\n"; build_graph(); print "SHOULD BE THE LAST THING PRINTED, HOWEVER ...\n"; __END__

This yields:

BUILDING GRAPH Neighbors Node=HASH(0x104fbc) ID 1 Node=HASH(0xfc450) ID 3 Node=HASH(0xfc414) ID 2 DESTROYING Network=HASH(0xfc04c) DESTROYING Link=HASH(0xfc498) 2 DESTROYING Link=HASH(0xfc438) 1 DESTROYING Link=HASH(0xfc4c8) 3 DESTROYING Link=HASH(0xfc4f8) 4 SHOULD BE THE LAST THING PRINTED, HOWEVER ... DESTROYING Node=HASH(0xfc414) 2 DESTROYING Node=HASH(0x104fbc) 1 DESTROYING Node=HASH(0xfc450) 3

Further reading here


"The dead do not recognize context" -- Kai, Lexx

Replies are listed 'Best First'.
Re: Depth First Search through Digraph Results in Memory Leak
by ysth (Canon) on Jan 08, 2004 at 11:44 UTC
    I think your guess may be right. I changed it to have:
    my ($do_search1, $do_search2); $do_search2 = sub { my ($node) = @_; $sub->($node); $explored->{$node->{_id}}++; foreach my $link (@{$node->{_outlinks}}) { die "too deep" unless ($explored->{$link->{_id}}); } }; $do_search1 = sub { my ($node) = @_; $sub->($node); $explored->{$node->{_id}}++; foreach my $link (@{$node->{_outlinks}}) { $do_search2->($link->{_to}) unless ($explored->{$link->{_id}}) +; } }; $do_search = sub { my ($node) = @_; $sub->($node); $explored->{$node->{_id}}++; foreach my $link (@{$node->{_outlinks}}) { $do_search1->($link->{_to}) unless ($explored->{$link->{_id}}) +; } };
    so no recursive calls were done, and got:
    BUILDING GRAPH Neighbors Node=HASH(0xa0920e8) ID 1 Node=HASH(0xa092160) ID 3 Node=HASH(0xa092124) ID 2 DESTROYING Network=HASH(0xa0420ac) DESTROYING Link=HASH(0xa0921b4) 2 DESTROYING Link=HASH(0xa092184) 1 DESTROYING Link=HASH(0xa0921e4) 3 DESTROYING Link=HASH(0xa092214) 4 DESTROYING Node=HASH(0xa092160) 3 DESTROYING Node=HASH(0xa0920e8) 1 DESTROYING Node=HASH(0xa092124) 2 SHOULD BE THE LAST THING PRINTED, HOWEVER ...
    Can you simplify it so you just manually bless a few references into "main" and perlbug it?

      So there's something funky going on with perl's ability to clear lexical pads containing self-referential structures in recursive methods. I think we'll need an internals guru to get any further, but thanks for your feedback ysth.


      "The dead do not recognize context" -- Kai, Lexx
Re: Depth First Search through Digraph Results in Memory Leak
by Hofmator (Curate) on Jan 08, 2004 at 15:07 UTC

    Yes, like ysth says, this looks like a bug in perl. A workaround here is to change the sub transitive_closure_DFS like this:

    sub transitive_closure_DFS { my ($self, $node) = @_; my $nodes = []; my $search = sub { push @$nodes, $_[0] }; $self->DFS($node, $search); my @nodes = @$nodes; $nodes = ''; return [@nodes]; }

    So it seems that somehow the anonymous array in $nodes is not properly destroyed automatically. But how this is related to the recursiveness dependency ysth mentions escapes me ...

    Update
    I've written a follow-up with a simplified version of the problem (and an additional segfault :) here.

    -- Hofmator

      Well I'll be damned, that fixes it. I've changed $nodes = '' to undef $nodes with the same results. So for some reason perl needs to be told to destroy the array when it should take a cue from the lexical going out of scope. I'll prepare a bug report after I stop sighing in relief. Thank you Hofmator!


      "The dead do not recognize context" -- Kai, Lexx

        Change those print calls to warn(). I'm guessing that those arrays are kept around because the closures were kept around and that both were eventually cleaned up during global destruction. You are always free to clear a variable with prejudice - use undef as a function. undef $nodes might give you some different results. (you already did that. Whoops).

        That may still be a bug though but whether it is in the reference counting of $nodes' array or $sub's clusure I don't know. But then it isn't clear to me that it is a bug.

Re: Depth First Search through Digraph Results in Memory Leak
by Hofmator (Curate) on Jan 09, 2004 at 09:09 UTC

    I just noted a different bug in your code, in sub DFS

    $explored->{$node->{_id}}++; foreach my $link (@{$node->{_outlinks}}) { $do_search->($link->{_to}) unless ($explored->{$link->{_id}}); }
    should be
    $explored->{$node->{_id}}++; foreach my $link (@{$node->{_outlinks}}) { $do_search->($link->{_to}) unless ($explored->{$link->{_to}{_id}}); # we stored the node id, not the link id in $explored ^^^^^ }

    -- Hofmator

      I think that's okay. I'm using the algorithm from Goodrich & Tamassia's Data Structures and Algorithms in Java pg. 377. In pseudocode:

      DirectedDFS(v):
      Mark vertex v as marked.
      for each outgoing edge (v, w) of v do
      if vertex w has not been visited
      then recursively call DFS(w).

      Which makes sense to me since the objective is to locate all reachable vertices. Imagine the case where you had more than one edge connecting two vertices. You'd end up counting all the paths to a node but that isn't the goal, well at least not a transitive closure anyway.


      "The dead do not recognize context" -- Kai, Lexx

        Ok, first to get the terminology straight, 'node' eq 'vertex' and 'edge' eq 'link'.

        In your code both nodes and links have an id. Let's look at your original code:

        # this marks the current vertex $explored->{$node->{_id}}++; foreach my $link (@{$node->{_outlinks}}) { # $link isa Link, so $link->{_id} is the id of the link, # not of the vertex!! $do_search->($link->{_to}) unless ($explored->{$link->{_id}}); }
        Yes, your algorithm is fine, it's just the implementation that has a little bug in it. Here's my (correct) code again, this time written explicitly:
        $explored->{$node->{_id}}++; foreach my $link (@{$node->{_outlinks}}) { my $new_node = $link->{_to}; $do_search->($new_node) unless ($explored->{$new_node->{_id}}); }

        -- Hofmator

Re: Depth First Search through Digraph Results in Memory Leak
by Elian (Parson) on Jan 19, 2004 at 15:51 UTC
    While I've not dug into it in any depth, odds are you're getting bit by a bug in perl's pad caching.

    When you go recursive on a sub, it triggers a more aggressive caching scheme for the lexical pad for a sub, and it looks like as part of that a top-level pad's not getting cleaned up when it ought to be. (It's also possible this is triggering a cleanup delay/bug in the top-level pad)

    If you can, try boiling this down as small as possible and fire it off as a bug report with perlbug.