Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

Re: Using Web::Scraper to extract content from an HTML page

by tangent (Parson)
on Apr 04, 2017 at 01:02 UTC ( [id://1186929]=note: print w/replies, xml ) Need Help??


in reply to Using Web::Scraper to extract content from an HTML page

As beech points out the 'title' is in the 'img' tag not the 'a' tag so you need to account for that. Also, process_first would only work if there were multiple tags within the cell itself, not within the row. But you can skip the empty ones while looping through the results:
my $p1 = scraper { process 'table[class="dextable"] td[class="cen"]', "list[]" => scrap +er { process "a", uri => '@href'; process "img", name => '@title'; }; }; my $res = $p1->scrape( URI->new("http://serebii.net/duel/figures.shtml +") ); for my $p (@{$res->{list}}) { next unless ($p->{name} and $p->{uri}); print Encode::encode("utf8", "$p->{name}\t$p->{uri}\n"); }

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1186929]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others wandering the Monastery: (6)
As of 2024-04-25 12:31 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found