Re: Using Web::Scraper to extract content from an HTML page

in reply to Using Web::Scraper to extract content from an HTML page

As beech points out the 'title' is in the 'img' tag not the 'a' tag so you need to account for that. Also, process_first would only work if there were multiple tags within the cell itself, not within the row. But you can skip the empty ones while looping through the results:

my $p1 = scraper {
  process 'table[class="dextable"] td[class="cen"]', "list[]" => scrap
+er {
    process "a", uri => '@href';
    process "img", name => '@title';
  };
};

my $res = $p1->scrape( URI->new("http://serebii.net/duel/figures.shtml
+") );

for my $p (@{$res->{list}}) {
  next unless ($p->{name} and $p->{uri});
  print Encode::encode("utf8", "$p->{name}\t$p->{uri}\n");
}
[download]

In Section Seekers of Perl Wisdom