First of all I really like the idea of this thread. I
think that the process of reviewing code is very
instructive. This is particuarly true when experienced
programmers review each other's code.
I hope you don't mind, but I picked a simpler piece of
code, xstatswhore.pl to work with. It is at least related...
- You have documentation aimed at users embedded in the
code using #. If you put the same documentation in pod
format it can be extracted with the perldoc utility.
- You have key variables like $pm_site which are scoped
for the whole program basically so that you can issue a
request in one function and then do error handling in
another. I prefer to put the error handling where the
action happens, and if I can move that variable to a
tighter scope, that is fine by me.
- You have a known problem in your data - the home node
is considered an article and you don't want it to be.
Again there is action at a distance with the fix for the
issue appearing nowhere near where the problem is. Were
vroom to fix the bug on his end, you would also now be
correcting for a non-existent bug and you would now get
the wrong answer. I would try to find the home node and
delete it.
- The indenting on your XML parsing goes very deep. If
you are going to leave the logic inline, using a standard
small indent for each level would make it more readable
for me. I keep to 80 columns.
- On a related note, I know mirod will dislike
me for saying this, but every time I look at the XML
approach to a problem I have to wonder why it is so
hard to work with. I am seriously tempted to handroll
a solution to kill that dependency. I have had bad luck
in the past on XML::* upgrades, and many have (as you
note) trouble installing them. Plus I bet it would be
faster.
- In get_article_list, why do you pull the node in and
then call it through @_ rather than using the variable
that you have in scope?
- I agree with you that max and min would be nice to
have as native functions. But you can get them from
List::Util, and that is good enough for me.
Plus the way that it implements them is able to take a
list as an argument.
- I don't like the hard-coded constants in your summarize
function. Also Perl is list-oriented. If you take the
approach of first extracting an array of reps and then
taking summary statistics on that, then you can simplify
that code without running into bugs on users with only
negative reps.
- The histogram code is buggy in at least 2 ways. First
of all the maximum rep has nothing to do with the widest
bin. Secondly the home node is an extra article with a
rep of 0. The second one shows, again, the dangers of
fixing a bug at a distance from where it enters your
code. Then you have to always remember it and fix it at
a distance in multiple places.
- In show_histogram I don't like the naming of the hash.
On page 8 of Camel 2 there is the advice that a hash
lookup is linguistically "of". So you say that
$wife{Adam} = "Eve";. Given that the hash is
has the size of the bin for its value, it deserves a
name like %bin_count.
- I do not like the way that show_histogram loops. In
my experience looping by explicit incrementing values
(particularly if you are incrementing several in parallel)
is easier to get wrong than looping over a range.
- A matter of personal taste. I tend to leave out
tests on numbers of arguments. Then again I tend to like
APIs with variable number of arguments. And if I want to
be sure that the caller knows my API, I am likely to hack
up a named argument approach - in my experience I am less
likely to forget how many arguments I have than I am to
forget what order they go in. So I omitted them, but
that is very much a YMMV item.
And I guess that if I am going to say all of that, I should
show what the revised version looks like:
#!/usr/local/bin/perl -w
use LWP::Simple;
use XML::Twig;
use URI::Escape;
use HTML::Entities;
use List::Util qw(min max sum);
use Carp;
use Getopt::Std;
use POSIX qw(ceil floor);
use strict;
{
my %args;
getopts('u:p:b:', \%args);
# Replace these dies with username/password
my $user = $args{u} || # You can put your name here
die("You need to specify the user with -u");
my $pass = $args{p} || # You can put your password here
die("You need to specify the password with -p");
my $bin_size = $args{b} || 5;
my $articles = get_article_list($user, $pass);
my $summary = summarize ($user, $articles);
show_reps($summary);
if (0 < $bin_size) {
show_histogram($bin_size, $articles, $summary);
}
}
# Takes user name/password, returns a hash of articles
# Sorry mirod, every time I look at this kind of thing,
# I have to wonder *why* programming for XML always
# seems so much harder than it needs to be.
sub get_article_list {
my ($user, $pass) = @_;
my %node_info;
my $twig = new XML::Twig(
TwigRoots => {
NODE => sub {
my ($t, $node) = @_;
my $title = decode_entities($node->text());
$title =~ s/'/'/g; # Why is this missed?
my %rec = (
id => $node->att('id'),
rep => $node->att('reputation'),
title => $title,
date => $node->att('createtime'),
);
if (exists $node_info{$rec{id}}) {
confess("Duplicated id '$rec{id}'");
}
else {
$node_info{$rec{id}} = \%rec;
}
$t->purge();
},
}
);
$twig->parse(
get_pm_node("User nodes info xml generator", $user, $pass)
);
unless (%node_info) {
confess("Couldn't fetch article info for $user with pass $pass");
}
# Vroom includes your home node as an article. :-(
my $home_id = min (keys %node_info);
if ($node_info{$home_id}{title} eq $user) {
delete $node_info{$home_id};
}
else {
warn("Is $home_id your home id? Presuming not.\n");
}
return \%node_info;
}
# Takes a node name, plus optional name/password, and fetches the page
sub get_pm_node {
my $node = uri_escape(shift);
my $url = "http://www.perlmonks.org/index.pl?node=$node";
if (@_) {
my $user = uri_escape(shift);
my $passwd = uri_escape(shift);
$url .= "&op=login&user=$user&passwd=$passwd";
}
# Why doesn't LWP::Simple preserve the error?
return get($url) or confess("Cannot fetch '$url'");
}
sub show_histogram {
my $bin_size = shift;
my $articles = shift;
my %bin_count;
foreach my $rep (map {$_->{rep}} values %$articles) {
my $bin = floor( ($rep + 0.5) / $bin_size);
++$bin_count{$bin};
}
# Try to keep on one page
my $width = 50;
my $max_count = max (values %bin_count);
my $scale = ceil($max_count / $width);
print " Reputation Article Count\n";
print "------------- -------", "-" x 50, "\n";
my @bins = sort {$a <=> $b} keys %bin_count;
foreach my $bin (min(@bins)..max(@bins)) {
my $count = $bin_count{$bin} || 0;
my $extra = ($count % $scale) ? '.' : '';
my $start = $bin * $bin_size;
my $end = $start + $bin_size - 1;
printf "%4d .. %4d \[%4d\] %s$extra\n",
$start, $end, $count,
'#' x floor ($count / $scale);
}
print "\n Scale: #=$scale\n\n" if $scale > 1;
}
# This is essentially untouched.
sub show_reps {
my $hsummary = shift;
print "\n";
printf (" User: %s\n", $hsummary->{username});
printf (" Total articles: %d\n", $hsummary->{articles});
printf (" Total reputation: %d\n", $hsummary->{reputation});
printf (" Min reputation: %d\n", $hsummary->{repmin});
printf (" Max reputation: %d\n", $hsummary->{repmax});
printf ("Average reputation: %3.2f\n", $hsummary->{average});
print "\n";
}
sub summarize {
my $user = shift;
my $articles = shift;
my @reps = map {$_->{rep}} values %$articles;
unless (@reps) {
confess("No articles found.");
}
my %summary = (
articles => scalar(@reps),
repmax => max(@reps),
repmin => min(@reps),
reputation => sum(@reps),
username => $user,
);
$summary{average} = $summary{reputation} / $summary{articles};
return \%summary;
}
__END__
=head1 NAME
t_statswhore.pl - get summary statistics
=head1 SYNOPSIS
t_statswhore.pl C<-u> <user> C<-p> <password> [C<-b> <bin size>]
=head1 DESCRIPTION
This program goes to www.perlmonks.org and generates
summary statistics about your posts. This only works
for your own account since reps are proprietary.
If you wish you can embed your name and password into the
code instead of passing them on the command line.
=head1 DEPENDENCIES
Quite a few. LWP::Simple, XML::Twig, List::Util,
URI::Escape, and HTML::Entities.
=head1 TIPS
An interested developer should be able to fairly easily
remove the dependency on XML::Twig. URI::Escape is
largely gratuitous, and the main purpose of
HTML::Entities is to make sure that people with strange
names can have their home node identified and removed
from the article list.
=head1 AUTHOR AND COPYRIGHT
Written by Ben Tilly based on the original by J.C.Wren.
Copyright 2000, 2001 J.C.Wren jcwren@jcwren.com
Copyright 2001 by Ben Tilly
No rights reserved. But jcwren likes hearing about
people using it.
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.