package HTML::TokeParser::Smart;
require 5.006;
use strict;
use warnings;
use Carp;
use LWP;
use base 'HTML::TokeParser';
our $VERSION = '0.2';
sub new {
my $proto = shift;
my $class = ref($proto) || $proto;
my $url = shift;
my $self;
if (-e $url) {
# It's a file!
$self = HTML::TokeParser->new($url);
}
elsif ($url =~ m/^https?|^ftp|^file/) {
# It's a URL!
my $browser = LWP::UserAgent->new;
my $req = $browser->request(HTTP::Request->new(GET=>$url));
croak "Unable to get webpage: $url ", $req->status_line unless $re
+q->is_success;
$self = HTML::TokeParser->new($req->content_ref);
}
elsif ($url =~ m/<[^>]+>/) {
# It's HTML!
$self = HTML::TokeParser->new(\$url);
}
else {
croak "'$url' is neither a valid URL, file, or HTML.";
}
bless ($self, $class);
return $self;
}
1;
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.
|