HTML::TreeBuilder::XPath finding attribute values

mldvx4 has asked for the wisdom of the Perl Monks concerning the following question:

I'm trying to extract the values of specific attributes from various HTML elements using XPaths and HTML::TreeBuilder::XPath. Say I have an anchor, <a href="foobar.html">One Link</a>, and I would like to extract the value of the attribute "href" from it. That would be "foobar.html". Or if I have meta data, <meta name="description" content="foobar" />, then I would like to find the value of the attribute "content", which is "foobar", and where the attribute "name" has the value "description". I think I have the right XPath, as it works in other tools, but instead of giving me the value of the attribute "content" it gives me this error:

Can't locate object method "as_text" via package "HTML::TreeBuilder::XPath::Attribute" at ./x1.pl line 15.

What have I missed in the code below and how to tweak it?

#!/usr/bin/perl

use HTML::TreeBuilder::XPath;

use strict;
use warnings;

my $root = HTML::TreeBuilder::XPath->new;
$root->parse_file(\*DATA);
$root->eof();

for my $d ($root->findnodes('//html/head/meta[@name="description"]/@co
+ntent'))
{
    print qq(D=\n);
    print $d->as_text;
}

$root->delete;

exit(0);

__DATA__

<html>
<head>
 <meta name="description" content="foobar" />
</head>
<body>
 <h1>FOO</h1>
 <p>Bar</p>
</body>
</html>
[download]

Comment on HTML::TreeBuilder::XPath finding attribute values Download Code

Replies are listed 'Best First'.
Re: HTML::TreeBuilder::XPath finding attribute values by Corion (Patriarch) on Aug 05, 2019 at 18:00 UTC
If your query returns an attribute, you can only call attribute methods on it. If your query returns a node, you can only call node methods on it. In your case, you can either check the type of the returned value or you can look at your query and then fetch the appropriate thing. The latter is what I do in HTML::Selector::XPath: `my $attr; if ($selector =~ s!/?\@(\w+)$!!) { $attr = $1; }; ... my @nodes; if (! defined $attr) { @nodes = map { $_->as_trimmed_text } $tree->findnodes($sel +ector); } else { @nodes = $tree->findvalues("$selector/\@$attr"); };` [download]	[reply] [d/l]
Re: HTML::TreeBuilder::XPath finding attribute values by tangent (Parson) on Aug 06, 2019 at 14:57 UTC
As Corion points out your query is returning an Attribute node so you need to replace `as_text` with `string_value`. You can also test the node like this: `if ( $d->isAttributeNode ) { print qq(D=\n); print $d->string_value; }` [download]	[reply] [d/l] [select]
Re^2: HTML::TreeBuilder::XPath finding attribute values by mldvx4 (Friar) on Aug 11, 2019 at 12:32 UTC
Thanks. That 'isAttributeNode' was very useful to learn about.	[reply]


Perl: the Markov chain saw
	PerlMonks