Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

HTML::TreeBuilder::XPath finding attribute values

by mldvx4 (Friar)
on Aug 05, 2019 at 16:35 UTC ( [id://11103966]=perlquestion: print w/replies, xml ) Need Help??

mldvx4 has asked for the wisdom of the Perl Monks concerning the following question:

I'm trying to extract the values of specific attributes from various HTML elements using XPaths and HTML::TreeBuilder::XPath. Say I have an anchor, <a href="foobar.html">One Link</a>, and I would like to extract the value of the attribute "href" from it. That would be "foobar.html". Or if I have meta data, <meta name="description" content="foobar" />, then I would like to find the value of the attribute "content", which is "foobar", and where the attribute "name" has the value "description". I think I have the right XPath, as it works in other tools, but instead of giving me the value of the attribute "content" it gives me this error:

Can't locate object method "as_text" via package "HTML::TreeBuilder::XPath::Attribute" at ./x1.pl line 15.

What have I missed in the code below and how to tweak it?

#!/usr/bin/perl use HTML::TreeBuilder::XPath; use strict; use warnings; my $root = HTML::TreeBuilder::XPath->new; $root->parse_file(\*DATA); $root->eof(); for my $d ($root->findnodes('//html/head/meta[@name="description"]/@co +ntent')) { print qq(D=\n); print $d->as_text; } $root->delete; exit(0); __DATA__ <html> <head> <meta name="description" content="foobar" /> </head> <body> <h1>FOO</h1> <p>Bar</p> </body> </html>

Replies are listed 'Best First'.
Re: HTML::TreeBuilder::XPath finding attribute values
by Corion (Patriarch) on Aug 05, 2019 at 18:00 UTC

    If your query returns an attribute, you can only call attribute methods on it. If your query returns a node, you can only call node methods on it.

    In your case, you can either check the type of the returned value or you can look at your query and then fetch the appropriate thing. The latter is what I do in HTML::Selector::XPath:

    my $attr; if ($selector =~ s!/?\@(\w+)$!!) { $attr = $1; }; ... my @nodes; if (! defined $attr) { @nodes = map { $_->as_trimmed_text } $tree->findnodes($sel +ector); } else { @nodes = $tree->findvalues("$selector/\@$attr"); };
Re: HTML::TreeBuilder::XPath finding attribute values
by tangent (Parson) on Aug 06, 2019 at 14:57 UTC
    As Corion points out your query is returning an Attribute node so you need to replace as_text with string_value. You can also test the node like this:
    if ( $d->isAttributeNode ) { print qq(D=\n); print $d->string_value; }

      Thanks. That 'isAttributeNode' was very useful to learn about.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://11103966]
Approved by GrandFather
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chilling in the Monastery: (1)
As of 2024-04-18 23:32 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found