threaded display of replies

epoptai has asked for the wisdom of the Perl Monks concerning the following question:

I'm working on a newest nodes client and need some advice with the code that generates a threaded display of root nodes and replies like this:

Hex to decimal by Ras
- Re: Hex to decimal by busunsl
  - Re: Re: Hex to decimal by Ras
    - Re: Re: Re: Hex to decimal by Albannach
      - (tye)Re2: Hex to decimal by tye
      - Re: Re: Re: Re: Hex to decimal by merlyn
    - Re: Re: Re: Hex to decimal by I0
- Re: Hex to decimal by turnstep
- Re: Hex to decimal by TeKk9
- (tye)Re: Hex to decimal by tye

I've tried coding this from scratch twice and ended up with two different ways to make the same mistake!

The Newest Nodes XML Generator makes tags like this (for example a root node and reply):

<NODE nodetype="categorized question" author_user="11732" 
node_id="76108" createtime="20010427142501">

<NODE nodetype="categorized answer" author_user="11732" 
node_id="76130" createtime="20010427150308" parent_node="76108">
[download]

The strategy has been to find replies to root nodes, then find replies to the replies, and so on. But the way i'm doing it only finds replies to the depth that's been coded for. For example the following code's maximum depth is Re:(3) because there are 3 for loops, it doesn't look for Re:(4) and beyond.

sub thread
{
print '<ul>';
for my $node (sort {$b <=> $a} keys %created){ # for all nodes
# if doesn't have a parent and isn't a user
if((!$parent{$node}) && ($nodetype{$node} ne 'user')){
  if($created{$node}=~/^(....)(..)(..)(..)(..)(..)$/){
     $created{$node} = "$4:$5:$6 $2/$3"
     }
  print qq~$types{$nodetype{$node}} ($created{$node})<br>
  $content{$node}
  by $whom{$author{$node}}<br>~;

  my$d = 0;
  for my $p1 (sort {$a <=> $b} keys %parent){ # re
    if( ($parent{$p1} == $node) ){
       unless($d > 0){ print '<ul>'}
       print qq~$content{$p1} 
       by $whom{$author{$p1}}<br>~;
       $d++;
       
      my$e = 0;
      for my $p2 (sort {$a <=> $b} keys %parent){ # re(2)
        if( ($parent{$p2} == $p1) ){
           unless($e > 0){ print '<ul>'}
           print qq~$content{$p2} 
           by $whom{$author{$p2}}<br>~;
           $e++;
             
           my$f = 0;
           for my $p3 (sort {$a <=> $b} keys %parent){ # re(3)
             if( ($parent{$p3} == $p2) ){
                unless($f > 0){ print '<ul>'}
                print qq~$content{$p3} 
                by $whom{$author{$p3}}<br>~;
                $f++;
                        }
                     }
                  if($f > 0){ print '</ul>'}
                  }
               }
            if($e > 0){ print '</ul>'}
            }
         }
      if($d > 0){ print '</ul>'}
      }
   }
print '</ul>';
}
[download]

Ugh! I tried again with a hash like the one in Re: Graphical Hierarchical Tree (that should've been a hash of arrays i guess), but had the same problem.

sub thread
{
my%children;
for my $root (sort {$a <=> $b} keys %created){ # for all nodes
   my(@kids,$kids);
   # if doesn't have a parent and isn't a user
   if( ($nodetype{$root} ne 'user') ){
      for my $child (sort {$a <=> $b} keys %parent){ # for all childre
+n
         if( ($parent{$child} == $root) ){ # if this is a child
            push @kids, $child;
            } # if
         $kids = join '|', @kids;
         $children{$root} = $kids;
         }
      }
   } the rest is predictable noise...
[download]

What's the best way to find the entire tree of replies?

Comment on threaded display of replies Select or Download Code

Replies are listed 'Best First'.
Re: threaded display of replies by mikfire (Deacon) on Apr 28, 2001 at 00:20 UTC
What if you were to turn this around a bit and not order the datastructure but added an element that preserved the order? My idea would be to have the primary data structure look something like this and keyed by the node id. `my %created = ( AUTHOR => $author, TYPE => $node_type, TIME => $node_time, PARENT => $parent_id \|\| '', KIDS => [], );` [download] One pass will load the structure from the data stream. A second pass will populate the KIDS array something like this. If the datastream is in order, or you are willing to assume that any referenced parent node will be included in the stream, you could actually include this in the first pass and save a loop. Personally, I prefer to keep the data acquisition loop seperate from the data manipulation loop - it makes it easier to debug. `for my $root ( sort { $a <=> $b } keys %created ) { my $parent = $created{$root}{PARENT}; push @{$created{$parent}{KIDS}}, $root; }` [download] One final loop, then, can print out all the nodes in, hopefully a hierchical order by using a bit of recursion ( breadth-first? ) `sub print_nodes { my @kids = @_; for my $id ( @nodes ) { print "Interesting data from node\n"; if ( @{$created{$id}{KIDS} ) { print_nodes( @{$created{$id}{KIDS}; } } } # Some work to make sure we start at the parents and move # our way down correctly. grep would work just as well. my @topnodes = (); for my $id ( keys %created ) { push @topnodes, $id unless ( defined( $created{$id}{PARENT} ) ); } print_nodes( @topnodes );` [download] mikfire	[reply] [d/l] [select]
Re: threaded display of replies by Masem (Monsignor) on Apr 28, 2001 at 00:14 UTC
Assuming that all the data contains is the node number and the node number of the parent node, you simply cannot find all children; you can only work backwards up from a node to the parent. However, and where it will get messy, is to take the child node, find it's parent (and all nodes in between), then increament on the node number starting with the ultimate parent node; if the parent of that node is one that is in the thread already, add it to the thread. This will get trees like: `P +--1 +--2 +--3 +--4 +--5 <- the current node +--6 +--7 +--8` [download] Without increamenting on the node number, you cannot get 6, 7 and 8. Nor would you get 2. Dr. Michael K. Neylon - mneylon-pm@masemware.com \|\| "You've left the lens cap of your mind on again, Pinky" - The Brain	[reply] [d/l]
Re: threaded display of replies by ZZamboni (Curate) on Apr 29, 2001 at 07:48 UTC
For my PerlMonks::NewestNodes module, I used a technique very similar to the one described by mikfire. All the nodes are stored in a hash indexed by node ID, and each element is a hash reference containing the different pieces of information about the node. The thread subroutine receives as argument a list of nodes to thread (this allows adding new nodes to the existing structure without rethreading everything). Then it goes through those nodes, and for each one it visits its parent and inserts its ID into the parent's "kids" element, which is an array ordered chronologically. Nodes that do not have a parent_node element are marked as "top nodes". This is complicated a little bit by the fact that the Newest Nodes XML Generator will give you a node, but not its parent, when the parent is already marked as seen. So the thread subroutine also checks to see which nodes it doesn't have yet, and adds those to a list of "nodes to get". The nodes without parents are added to a list of "nodes to rethread". Nodes that are threaded correctly are removed from the list. At the end, any necessary nodes are grabbed (using the Node Query XML Generator, you pass it an argument of the form "nodes=id1,id2,..."), added to the list, and the loop repeats. Once every node has its kids element properly populated, it's just a matter of descending recursively from each "top node" to generate its tree of replies. Shameless plug: check out Shendal's Perl/Tk Newest Nodes Client, which uses my module to show you a very nice threaded display of the Newest Nodes. --ZZamboni	[reply]
Will this break your program? by Sprad (Hermit) on Apr 28, 2001 at 00:21 UTC
What happens when somebody doesn't use the Re: Re: Re: Re: pattern, and perversely changes the title in their reply? (like I did) Cases like this aside, though, it'd be a welcome addition. --- A fair fight is a sign of poor planning.	[reply]
Re: Will this break your program? by mikfire (Deacon) on Apr 28, 2001 at 00:27 UTC
Sprad, if you look at the XML stream, the parent node id is included probably to avoid exactly this issue. mikfire	[reply]
Re: Will this break your program? by little (Curate) on Apr 28, 2001 at 00:49 UTC
well, each node should be represented by a node_id in a DB or whatever file U use and this should know (store) the id of its parent, so you can't mess up things. And btw, there is no need to let anode know if it has children or not. If there are children THEY WILL KNOW their parent. Keep it simple: stupid. :-) btw BmeCat works this way and made it to a standard. Have a nice day All decision is left to your taste	[reply]
(bbfu) (parents' children) Re(2): Will this break your program? by bbfu (Curate) on Apr 28, 2001 at 01:15 UTC
so said little: And btw, there is no need* to let anode* [sic] know if it has children or not. If there are children THEY WILL KNOW their parent. Unless, of course, you want to find all the children of a particular node without doing an exhaustive search of the DB. While not strictly necessary, it can make life simpler in some cases. I think this application might qualify as one of those cases (see Masem's reply). :-) bbfu Seasons don't fear The Reaper. Nor do the wind, the sun, and the rain. We can be like they are.	[reply]
Re: (bbfu) (parents' children) Re(2): Will this break your program? by little (Curate) on Apr 28, 2001 at 15:56 UTC
(bbfu) (expensive children) Re(4): Will this break your program? by bbfu (Curate) on Apr 30, 2001 at 00:39 UTC
Some notes below your chosen depth have not been shown here


Perl: the Markov chain saw
	PerlMonks