Re: Using Web::Scraper to extract content from an HTML page


laziness, impatience, and hubris
	PerlMonks

Re: Using Web::Scraper to extract content from an HTML page

by tangent (Parson)

on Apr 04, 2017 at 01:02 UTC ( [id://1186929]=note: print w/replies, xml )

Need Help??

in reply to Using Web::Scraper to extract content from an HTML page

As beech points out the 'title' is in the 'img' tag not the 'a' tag so you need to account for that. Also, process_first would only work if there were multiple tags within the cell itself, not within the row. But you can skip the empty ones while looping through the results:

my $p1 = scraper {
  process 'table[class="dextable"] td[class="cen"]', "list[]" => scrap
+er {
    process "a", uri => '@href';
    process "img", name => '@title';
  };
};

my $res = $p1->scrape( URI->new("http://serebii.net/duel/figures.shtml
+") );

for my $p (@{$res->{list}}) {
  next unless ($p->{name} and $p->{uri});
  print Encode::encode("utf8", "$p->{name}\t$p->{uri}\n");
}
[download]

Comment on Re: Using Web::Scraper to extract content from an HTML page Select or Download Code

In Section Seekers of Perl Wisdom

Domain Nodelet^?

www.com | www.net | www.org

Node Status^?

node history
Node Type: note [id://1186929]
help

Chatterbox^?

How do I use this? • Last hour • Other CB clients

Other Users^?

Others wandering the Monastery: (6)

As of 2024-04-25 12:31 GMT

Sections^?

Information^?

Find Nodes^?

Leftovers^?

Today I Learned

Voting Booth^?

No recent polls found