Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?

Adjacency Tree Confusion

by ChrisR (Hermit)
on Apr 07, 2006 at 17:56 UTC ( #541940=perlquestion: print w/replies, xml ) Need Help??

ChrisR has asked for the wisdom of the Perl Monks concerning the following question:

I have been trying to implement an adjacency tree sort but can't seem to get it right. The desired output would be quite like the way threads are displayed here at PerlMonks. The real data is in a MySQL database. I started out with just a "record id" and a "parent id" to connect the tree but that didn't work out. I added a "lineage" field thinking I could just sort on that but it's still not working. Here is some sample code:
#!c:\perl58\bin\perl.exe -w use strict; my @array = (); while (<DATA>) { chomp($_); my @subarray = split(/,/,$_); push @array, [@subarray]; } my @sortedarray = sort { $a->[3] <=> $b->[3] || $a->[4] cmp $b->[4] || $a->[5] cmp $b->[5] } @array; for my $x(0..$#sortedarray) { print "$sortedarray[$x][0] - $sortedarray[$x][6]\n"; } #id,board,root,parent,lineage,stamp,text __DATA__ 1,2,0,0,00000000000,2006-01-02 08:15:00,test line 1 (first) 2,2,1,1,00000000001,2006-01-02 08:16:00,test line 2 (second) 3,2,1,2,00000000001-00000000002,2006-01-02 08:21:00,test line 3 (third +) 6,2,1,1,00000000001,2006-01-02 08:23:00,test line 6 (seventh) 4,2,1,1,00000000001,2006-01-02 08:22:00,test line 4 (sixth) 7,2,1,6,00000000001-00000000006,2006-01-02 08:25:00,test line 7 (eight +h) 5,2,1,3,00000000001-00000000002-00000000003,2006-01-02 08:23:00,test l +ine 5 (fourth) 8,2,1,1,00000000001,2006-01-02 08:17:00,test line 8 (fifth)
And here is the output:
1 - test line 1 (first) 2 - test line 2 (second) 8 - test line 8 (fifth) 4 - test line 4 (sixth) 6 - test line 6 (seventh) 3 - test line 3 (third) 5 - test line 5 (fourth) 7 - test line 7 (eighth)
And here is what I expected to see:
1 - test line 1 (first) 2 - test line 2 (second) 3 - test line 3 (third) 5 - test line 5 (fourth) 8 - test line 8 (fifth) 4 - test line 4 (sixth) 6 - test line 6 (seventh) 7 - test line 7 (eighth)
I do not have access to any DBM::DEEP or any of the Tree modules and I can't install any either. I would love to be able to do this without a recursive subroutine but have been fighting this for two days (off and on) now and throw myself on the mercy of the Monks.

Replies are listed 'Best First'.
Re: Adjancey Tree Confusion
by TedPride (Priest) on Apr 07, 2006 at 21:47 UTC
    Try something like the following. It should run in linear time, and reasonably efficiently. You will of course want to edit it so it's reading in data from your mySQL table (required fields id and parent), but that should be relatively easy. If you're constructing one specific thread from a mySQL query, you'll also need to store the root id with each post record, so you can select the thread you want without having to map all the post dependencies.
    use strict; use warnings; use Data::Dumper; my (%posts, @threads, $id, $parent, $title); while (<DATA>) { chomp; ### It's easy to add more fields here if you want: ($id, $parent, $title) = split / /, $_, 3; my %post = ('id' => $id, 'title' => $title); $posts{$id} = \%post; if ($id == $parent) { push @threads, \%post; } else { push @{$posts{$parent}{'children'}}, \%post; } } construct($_) for @threads; sub construct { my $p = $_[0]; print "$p->{'id'} $p->{'title'}\n"; construct($_) for @{$p->{'children'}}; } __DATA__ 1 1 Title 2 1 Re: Title 3 1 Re: Title 4 2 Re: Re: Title 5 5 Title 2 6 5 Re: Title 2 7 4 Re: Re: Re: Title
    EDIT: If you want to generate a subtree of a thread, a lineage field isn't necessary, or even that efficient if you have a large number of threads. Just select all posts with a root equal to the root of the subtree, then map those post dependencies (there probably won't be more than a few dozen posts total) and display just the ones dependant on the subtree ID. This saves you the trouble of storing all ancestor IDs for all posts and having to search through them, which can be quite wasteful in terms of disk space and/or processing time.
      Thanks for the example. I really appreciate it. I couldn't get it to work with my own data until I realized that you had the parent of the first row equal to the id of that row. In my data, I had the parent of the first row equal to zero. Just a logic fault on my part.

      I added a pre-sort at the beginning so that the time stamp of the node would be taken into account for nodes with the same parent, though I'm sure there is a better way to do it.

      use strict; use warnings; my (%posts, @threads, $id, $root, $board, $parent, $stamp, $title); my @array; while (<DATA>) { chomp($_); my @subarray = split(/,/,$_); push @array, [@subarray]; } my @sortedarray = sort { $a->[4] cmp $b->[4] } @array; for my $x (0..$#sortedarray) { ($id, $root, $board, $parent, $stamp, $title) = @{$sortedarray[$x] +}; my %post = ('id' => $id, 'root' => $root, 'parent' => $parent, 'st +amp' => $stamp, 'title' => $title); $posts{$id} = \%post; if ($id == $parent) { push @threads, \%post; } else { push @{$posts{$parent}{'children'}}, \%post; } } construct($_) for @threads; sub construct { my $p = $_[0]; print "$p->{'id'} $p->{'title'}\n"; construct($_) for @{$p->{'children'}}; } #id,board,root,parent,stamp,text __DATA__ 1,2,0,1,2006-01-02 08:15:00,test line 1 (first) 2,2,1,1,2006-01-02 08:16:00,test line 2 (second) 3,2,1,2,2006-01-02 08:21:00,test line 3 (third) 6,2,1,1,2006-01-02 08:23:00,test line 6 (seventh) 4,2,1,1,2006-01-02 08:22:00,test line 4 (sixth) 7,2,1,6,2006-01-02 08:25:00,test line 7 (eighth) 5,2,1,3,2006-01-02 08:23:00,test line 5 (fourth) 8,2,1,1,2006-01-02 08:17:00,test line 8 (fifth)
      Is it possible to have the all data produced by construct assigned/returned to a single array of hashes without the array having to be scoped globally? This is the main reason that I didn't want to use a recursive routine.
Re: Adjancey Tree Confusion
by kvale (Monsignor) on Apr 07, 2006 at 18:31 UTC
    I don't know your true sorting crierion, but the program is working as expected. For instace the third element of the output list cannot be 3 - test line 3 (third) because the fourth field in that element is 2, which is larger than the fourth field of five other elements.


Re: Adjancey Tree Confusion
by demerphq (Chancellor) on Apr 07, 2006 at 19:37 UTC

    It took me a while to figure out what you had in mind. It seems that you want to get a preorder tree traversal without actually building a tree. Which doesnt sound that great a plan.

    Why do you want to avoid a recursive routine?


Re: Adjancey Tree Confusion
by injunjoel (Priest) on Apr 08, 2006 at 00:18 UTC
    Try sorting it twice.
    #!/usr/bin/perl -w use strict; my @array = (); while (<DATA>){ chomp($_); my @subarray = split(/,/,$_); push @array, [@subarray]; } my @sorted = sort { $a->[1] <=> $b->[0] } #by parent sort { $a->[0] <=> $b->[0] } #by id @array; for my $x(0..$#sorted){ print "$sorted[$x][0]:$sorted[$x][1] - $sorted[$x][3]\n"; } #id,parent,stamp,text __DATA__ 1,0,2006-01-02 08:15:00,test line 1 (first) 2,1,2006-01-02 08:16:00,test line 2 (second) 3,2,2006-01-02 08:21:00,test line 3 (third) 6,1,2006-01-02 08:23:00,test line 6 (seventh) 4,1,2006-01-02 08:22:00,test line 4 (sixth) 7,6,2006-01-02 08:25:00,test line 7 (eighth) 5,3,2006-01-02 08:23:00,test line 5 (fourth) 8,1,2006-01-02 08:17:00,test line 8 (fifth)
    The output
    #id:parent - text 1:0 - test line 1 (first) 2:1 - test line 2 (second) 3:2 - test line 3 (third) 5:3 - test line 5 (fourth) 4:1 - test line 4 (sixth) 6:1 - test line 6 (seventh) 7:6 - test line 7 (eighth) 8:1 - test line 8 (fifth)
    Perhaps Im missing something (which is always possible) but Im not sure how id:8, parent:1 was going to end up fifth???
    Is that close to what you were looking for?

    "I do not feel obliged to believe that the same God who endowed us with sense, reason and intellect has intended us to forego their use." -Galileo
      Great example!

      The one thing that's missing is the date sort for nodes with the same parent. I tried a few things to add the date sort into your routine but I couldn't get it to work for me.

Re: Adjancey Tree Confusion
by TedPride (Priest) on Apr 08, 2006 at 05:46 UTC
    No matter what kind of table you're setting up, you should always have a field for record ID (PRIMARY KEY AUTO INCREMENT or whatever). Since records from SELECT are returned in order of the PRIMARY KEY (the same order in which they were created), and you're pushing them into arrays, proper order should be retained for children of the same parent. If instead you want them in a different order, like say ordered by last modified time instead of created time, then just use an ORDER BY in your query. Children of the same parent should now be ordered by modified time, without having to change the basic algorithm. Incidently, here's a modified version of the function for assigning everything to an array:
    my @all; construct($_, \@all) for @threads; print "$_->{'id'} $_->{'title'}\n" for @all; sub construct { my ($p, $all) = @_; push @$all, $p; construct($_, $all) for @{$p->{'children'}}; }

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://541940]
Approved by kvale
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others pondering the Monastery: (5)
As of 2020-10-20 12:10 GMT
Find Nodes?
    Voting Booth?
    My favourite web site is:

    Results (210 votes). Check out past polls.