PodMaster,
I spent a little time with
Super Search and couldn't find anything applicable. I then looked at
PTAV and didn't see a way to do this. That's when I whipped up the following:
#!/usr/bin/perl
use strict;
use warnings;
use HTML::TableContentParser;
use HTML::TokeParser::Simple;
use WWW::Mechanize;
use constant SOPW => '&ct=12';
my $mech = WWW::Mechanize->new( autocheck => 1 );
$mech->get( 'http://www.tinymicros.com/ptav/index.pl' );
open (OUTPUT, '>', $ARGV[0] || 'noreplies.txt');
select OUTPUT;
$| = 1;
print OUTPUT "<html>\n<ul>\n";
for my $year ( $mech->find_all_links( url_regex => qr/year/ ) ) {
$mech->get( $year->url() );
for my $month ( $mech->find_all_links( url_regex => qr/month/ ) )
+{
$mech->get( $month->url() );
for my $day ( $mech->find_all_links( url_regex => qr/day/ ) )
+{
$mech->get( $day->url() . SOPW );
my $table = HTML::TableContentParser->new()->parse( $mech-
+>content() );
for my $row ( @{ $table->[-2]{rows} } ) {
for my $cell ( @{ $row->{cells} } ) {
if ( $cell->{data} =~ /\(0\)/ ) {
print OUTPUT "<li>", clean_link( $cell ), "</l
+i>\n";
next;
}
}
}
sleep 3;
$mech->back();
}
$mech->back();
}
$mech->back();
}
print OUTPUT "</ul>\n</html>\n";
sub clean_link {
my $link = shift;
my $p = HTML::TokeParser::Simple->new( \$link->{data} );
my $node;
while ( my $token = $p->get_token ) {
last if $token->is_end_tag;
if ( $token->is_start_tag( 'a' ) ) {
($node) = $token->return_attr( 'href' ) =~ /(\d+)$/;
next;
}
if ( $token->is_text ) {
return "<a href='http://www.perlmonks.org/index.pl?node_id
+=$node'>"
. $token->as_is
. "</a>";
}
}
}
It generates a list of all root SoPW nodes without replies. The two alternatives I have seen are even more lacking:
- Use a modified view of Newest Nodes
This doesn't allow you to look at anything past a certain data and has no means of filtering beyond visual cues.
- Use PTAV as built
This requires looking day by day and has no means of filtering beyond visual cues.
Update:Added explanation of screen scraping and modified the code to only look at
SoPW since that was all that was being asked for. Needs a resume capability so that if it breaks you can start where you left off.