http://qs321.pair.com?node_id=127534

ask has asked for the wisdom of the Perl Monks concerning the following question:

Some weeks ago I spent half a Sunday on writing the beginnings of a web based newsreader. One of the things I am missing is threading. I was planning to implement jwz's threading stuff from http://www.jwz.org/doc/threading.html in Perl, but I haven't had time. Does anyone have suggestions for other algoritms?

I am making it to work with the perl.org nntp server. If you didn't know, then the "newsserver" I use is colobus (http://trainedmonkey.com/colobus/) which serves directly from the ezmlm archives. It's supposed to replace the MHonArc archives at http://archive.perl.org/.

I have put the current Mason components up at http://develooper.com/~ask/tmp/mason-nntp.tar.gz. If anyone wants to hack new features into it, then I'd be most grateful and be happy to put it up on http://nntp.perl.org/, of course with proper credits.

To make the threading more efficient I've been considering letting the web interface access colobus' BerkeleyDB databases directly, or export separate BerkeleyDBs with the relevant data for use in the web interface.

 - ask

-- 
ask bjoern hansen, http://ask.netcetera.dk/   !try; do();

Replies are listed 'Best First'.
Re: newsreader threading (boo)
by boo_radley (Parson) on Nov 26, 2001 at 19:28 UTC
    Here's an out of context snippet from a newsreader I wrote in wx (it's still not finished... ) that seems to thread as well as my regular newsreader. You can also see what my commenting looks like in the real world.

    Skimming over JWZ's document, I have latched onto the "references" bit, ignoring the In-Reply-To altogether. If you're interested in the full source, please /msg me -- the caveat (and reason why it hasn't been posted before) is that the reader itself is rather incomplete -- it retrieves, sorts & displays them, but doesn't store messages or message pointers.

    # sorting by oldest first seems to give the best chance for accurate m +essage threading. # I apologize for the nasty parens nesting scheme as well. foreach (sort { Date_Cmp ( $messages{$a}{"date"}, $messages{$b +}{"date"}) } keys %messages ) { my $ref = $messages{$_}{"references"}; print STDERR "Checking $_ for references -- $ref --\n"; # getrefs runs through all the messages in placed, # looking for references within the current message my $childof=get_refs($ref, \%placed); # # if the current message is a child of another, # nest it to that messages' node, otherwise place as a c +hild of # the newsgroup node. # # In either case, we add it to the list of messages we'v +e 'placed'. # 'placed' is a bad analogy, and should be replaced. if ($childof){ $placed{$_}=$this->AppendItem ($placed{$childof}, $_." + ".$messages{$_}{"subject"}); } else { $placed {$_}=$this->AppendItem ($item,$_." ".$messages +{$_}{"subject"}); } print STDERR "I have ". keys (%placed) . " items\n"; }

    and the sub get_refs,

    sub get_refs { my $msg_refs = shift; my $msg_list = shift; my @parent_refs =reverse split /\s+/, $msg_refs; foreach my $aref(@parent_refs) { print STDERR "\tChecking $msg_refs\n"; if (exists $$msg_list{$aref}){ print STDERR "$_ is child of $aref\n"; return $aref; } } return undef; }