Cleaning up my temp dir, I found the script I used to generate the link list (further edited) for [OT?] SCM recommendation for small to medium size Perl projects. I'd call it an overgrown -lp script and I wouldn't recommend people to do the same. Since it was meant to be a one shot only script some checks are appropriate for that particular situation and no more. In the same vein I used a sub with no proper parameter passing. But is is strict and warnings safe, so all in all it may be useful here for instructive purposes, especially since people routinely ask for ways to get web pages and parse HTML:
#!/usr/bin/perl -lpi.bak
use strict;
use warnings;
use LWP::Simple;
use HTML::TokeParser;
sub get_title {
my $doc=get $_ or
warn "Couldn't get <$_>\n" and return;
(my $p=HTML::TokeParser->new(\$doc))
->get_tag('title') or return;
$p->get_trimmed_text;
}
$_ or next;
my $title;
if (/wikipedia/) {
$_=(split m|/|)[-1];
s/_/ /g;
s/%(\w{2})/chr hex $1/ge;
($title=$_) =~ s/\s+\(.*?\)//;
s|^|wp://|;
} else {
$title=get_title || 'NOT FOUND';
}
$_="<li>[$_|$title]</li>";
__END__