Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

Quick url list to PM links hack

by blazar (Canon)
on Mar 19, 2007 at 12:13 UTC ( [id://605466]=CUFP: print w/replies, xml ) Need Help??

Cleaning up my temp dir, I found the script I used to generate the link list (further edited) for [OT?] SCM recommendation for small to medium size Perl projects. I'd call it an overgrown -lp script and I wouldn't recommend people to do the same. Since it was meant to be a one shot only script some checks are appropriate for that particular situation and no more. In the same vein I used a sub with no proper parameter passing. But is is strict and warnings safe, so all in all it may be useful here for instructive purposes, especially since people routinely ask for ways to get web pages and parse HTML:

#!/usr/bin/perl -lpi.bak use strict; use warnings; use LWP::Simple; use HTML::TokeParser; sub get_title { my $doc=get $_ or warn "Couldn't get <$_>\n" and return; (my $p=HTML::TokeParser->new(\$doc)) ->get_tag('title') or return; $p->get_trimmed_text; } $_ or next; my $title; if (/wikipedia/) { $_=(split m|/|)[-1]; s/_/ /g; s/%(\w{2})/chr hex $1/ge; ($title=$_) =~ s/\s+\(.*?\)//; s|^|wp://|; } else { $title=get_title || 'NOT FOUND'; } $_="<li>[$_|$title]</li>"; __END__

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: CUFP [id://605466]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others exploiting the Monastery: (8)
As of 2024-03-28 09:27 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found