Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

Re: how do i split a link

by gav^ (Curate)
on Apr 13, 2002 at 17:09 UTC ( [id://158796]=note: print w/replies, xml ) Need Help??


in reply to how do i split a link

You could combine HTML::LinkExtor and URI:
use HTML::LinkExtor; use URI; my @links = (); my $html = do { local $/; <DATA> }; sub extract_links { my ($tag, %attr) = @_; next unless $tag eq 'a'; my @parts = split /\./, URI->new($attr{href})->host; my $host = join '.', @parts[-2, -1]; push @links, $host; } my $p = HTML::LinkExtor->new(\&extract_links); $p->parse($html); print join "\n", @links; __DATA__ <a href="http://www.foo.com">description</a> <a href='http://www.foo.com'>image here</a>
Of course you might want to add some error checking...

gav^

Replies are listed 'Best First'.
(crazyinsomniac) Re^2: how do i split a link
by crazyinsomniac (Prior) on Apr 14, 2002 at 10:16 UTC
    You might find reading the module, as well as its documentation, saves typing ;)
    use HTML::LinkExtor; my @links = (); my $html = join'',<DATA>; # much more elegant than => do { local $/; < +DATA> }; sub extract_links { my ($tag,undef,$url) = @_; if($tag eq 'a') { push @links, $url->host; } } my $p = HTML::LinkExtor->new(\&extract_links,'http://foobar.com'); $p->parse($html); print join "\n", @links; __DATA__ <a href="http://www.foo.com">description</a> <a href='http://www.foo.com'>image here</a> <A href='http://foo-bar-publishers.co.uk'>image here</a>
    Also, this "foo.com" request is rather silly, considering all the weirdo naming conventions out there (city.county.state.us ...)

    update: no need for a patch, it's in there (at least in $VERSION = sprintf("%d.%02d", q$Revision: 1.31 $ =~ /(\d+)\.(\d+)/);).

     
    ______crazyinsomniac_____________________________
    Of all the things I've lost, I miss my mind the most.
    perl -e "$q=$_;map({chr unpack qq;H*;,$_}split(q;;,q*H*));print;$q/$q;"

      Thanks, I never knew about that 3rd parameter. Perhaps you can suggest a patch to the documentation?

      gav^

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://158796]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others avoiding work at the Monastery: (5)
As of 2024-03-28 19:42 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found