Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery
 
PerlMonks  

Data structure question from XML::XPath::XMLParser

by mldvx4 (Friar)
on Mar 28, 2021 at 18:48 UTC ( [id://11130522]=perlquestion: print w/replies, xml ) Need Help??

mldvx4 has asked for the wisdom of the Perl Monks concerning the following question:

I am wondering how to find the values of an attribute of an element which has been extracted using XML::XPath::XMLParser.

If I have the following program, I can get the element name with ${$node}->[5]. But what variable would I need to express to get the values of the 'href' attributes in the 'a' elements?

#!/usr/bin/perl use HTML::Tidy; use XML::XPath; use XML::XPath::XMLParser; use Data::Dumper; use strict; use warnings; my $body; while(my $line = <DATA>) { $body .= $line; } my $tidy = HTML::Tidy->new({output_xml=>1,numeric_entities=>1}); my $clean = $tidy->clean($body); my $parser = XML::XPath->new(xml => $clean); my $set = '//p/a'; my $nodes = $parser->find($set); foreach my $node ($nodes->get_nodelist) { print "\n"; print "FOUND\n\n", XML::XPath::XMLParser::as_string($node),"\n"; print Dumper($node); # print ${$node}->[5],"\n"; # element name } exit(0); _DATA__ <!doctype html> <html class="no-focus-outline no-js" lang="en-US" data-modal-active="true"> <head> <title>test</title> </head> <body> <h1>test&nbsp;heading</h1 <div> <p>paragraph one <a href="https://example.com/one/two.html">one</a> example.</p> <p>paragraph two <a href="https://example.com/two/three.html">another</a> example.</p> </div> </body> </html>

An excerpt from XML::XPath::XMLParser describes the attribute structure like this:

... Element Node [ $parent, # node_parent <position in current array>, # node_pos 'xxx', # node_prefix - namespace prefix on this element [ ... ], # node_children 'yyy', # node_name - element tag name [ ... ], # node_attribs - attributes on this element [ ... ], # node_namespaces - namespaces currently in scop +e ] Attribute Node [ $parent, # node_parent - the element node <position in current array>, # node_pos 'xxx', # node_prefix - namespace prefix on this element 'href', # node_key - attribute name 'ftp://ftp.com/', # node_value - value in the node ] ...

The output from Data::Dumper for $node variables is like this:

FOUND <a href="https://example.com/one/two.html">one</a> $VAR1 = bless( do{\(my $o = bless( [ bless( [ bless( [ bless( [ bless( [ bless( [ + undef, + undef, + 5, + undef, + [ + bless( do{\(my $o = ${$VAR1}->[0]->[0]->[0]->[0])}, 'XML::XPath::N +ode::Element' ) + ], + undef, + [], + [ + bless( do{\(my $o = bless( [ + ${$VAR1}->[0]->[0]->[0]->[0]->[0], + 0, + 10, + 'xml', + 'http://www.w3.org/XML/1998/namespace' + ], 'XML::XPath::Node::NamespaceImpl' ))}, 'XML::XP +ath::Node::Namespace' ) + ] ] +, 'XML::XPath::Node::ElementImpl' ), 0, 15, '', [ bless( + do{\(my $o = bless( [ + ${$VAR1}->[0]->[0]->[0]->[0], + 0, + 35, + ' ' + ], 'XML::XPath::Node::TextImpl' ))}, 'XML::XPath::Node::Tex +t' ), bless( + do{\(my $o = bless( [ + ${$VAR1}->[0]->[0]->[0]->[0], + 1, + 40, + '', + [ + bless( do{\(my $o = bless( [ + ${${$VAR1}->[0]->[0]->[0]->[0]->[4]-> +[1]}, + 0, + 45, + ' ' + ], 'XML::XPath::Node::TextImpl' ))}, 'X +ML::XPath::Node::Text' ), + bless( do{\(my $o = bless( [ + ${${$VAR1}->[0]->[0]->[0]->[0]->[4]-> +[1]}, + 1, + 50, + '', + [], + 'meta', + [ + bless( do{\(my $o = bless( [ + ${${${$VAR1}->[0] +->[0]->[0]->[0]->[4]->[1]}->[4]->[1]}, + 0, + 55, + '', + 'name', + 'generator' + ], 'XML::XPath::Nod +e::AttributeImpl' ))}, 'XML::XPath::Node::Attribute' ), + bless( do{\(my $o = bless( [ + ${${${$VAR1}->[0] +->[0]->[0]->[0]->[4]->[1]}->[4]->[1]}, + 1, + 60, + '', + 'content', + 'HTML Tidy for HT +ML5 for Linux version 5.6.0' + ], 'XML::XPath::Nod +e::AttributeImpl' ))}, 'XML::XPath::Node::Attribute' ) + ] + ], 'XML::XPath::Node::ElementImpl' ))}, + 'XML::XPath::Node::Element' ), + bless( do{\(my $o = bless( [ + ${${$VAR1}->[0]->[0]->[0]->[0]->[4]-> +[1]}, + 2, + 65, + ' ' + ], 'XML::XPath::Node::TextImpl' ))}, 'X +ML::XPath::Node::Text' ), + bless( do{\(my $o = bless( [ + ${${$VAR1}->[0]->[0]->[0]->[0]->[4]-> +[1]}, + 3, + 70, + '', + [ + bless( do{\(my $o = bless( [ + ${${${$VAR1}->[0] +->[0]->[0]->[0]->[4]->[1]}->[4]->[3]}, + 0, + 75, + 'test' + ], 'XML::XPath::Nod +e::TextImpl' ))}, 'XML::XPath::Node::Text' ) + ], + 'title', + [] + ], 'XML::XPath::Node::ElementImpl' ))}, + 'XML::XPath::Node::Element' ), + bless( do{\(my $o = bless( [ + ${${$VAR1}->[0]->[0]->[0]->[0]->[4]-> +[1]}, + 4, + 80, + ' ' + ], 'XML::XPath::Node::TextImpl' ))}, 'X +ML::XPath::Node::Text' ) + ], + 'head', + [] + ], 'XML::XPath::Node::ElementImpl' ))}, 'XML::XPath::Node:: +Element' ), bless( + do{\(my $o = bless( [ + ${$VAR1}->[0]->[0]->[0]->[0], + 2, + 85, + ' ' + ], 'XML::XPath::Node::TextImpl' ))}, 'XML::XPath::Node::Tex +t' ), bless( + do{\(my $o = ${$VAR1}->[0]->[0]->[0])}, 'XML::XPath::Node::Element' +), bless( + do{\(my $o = bless( [ + ${$VAR1}->[0]->[0]->[0]->[0], + 4, + 200, + ' ' + ], 'XML::XPath::Node::TextImpl' ))}, 'XML::XPath::Node::Tex +t' ) ], 'html', [ bless( + do{\(my $o = bless( [ + ${$VAR1}->[0]->[0]->[0]->[0], + 0, + 20, + '', + 'class', + 'no-focus-outline no-js' + ], 'XML::XPath::Node::AttributeImpl' ))}, 'XML::XPath::Node +::Attribute' ), bless( + do{\(my $o = bless( [ + ${$VAR1}->[0]->[0]->[0]->[0], + 1, + 25, + '', + 'lang', + 'en-US' + ], 'XML::XPath::Node::AttributeImpl' ))}, 'XML::XPath::Node +::Attribute' ), bless( + do{\(my $o = bless( [ + ${$VAR1}->[0]->[0]->[0]->[0], + 2, + 30, + '', + 'data-modal-active', + 'true' + ], 'XML::XPath::Node::AttributeImpl' ))}, 'XML::XPath::Node +::Attribute' ) ] ], 'XML::X +Path::Node::ElementImpl' ), 3, 90, '', [ bless( do{\(my +$o = bless( [ + ${$VAR1}->[0]->[0]->[0], + 0, + 95, + ' ' + ], 'XML::XPath::Node::TextImpl' ))}, 'XML::XPath::Node::Text' ), bless( do{\(my +$o = bless( [ + ${$VAR1}->[0]->[0]->[0], + 1, + 100, + '', + [ + bless( do{\(my $o = bless( [ + ${${$VAR1}->[0]->[0]->[0]->[4]->[1]}, + 0, + 105, + "test\x{a0}heading" + ], 'XML::XPath::Node::TextImpl' ))}, 'XML::XPath +::Node::Text' ) + ], + 'h1', + [] + ], 'XML::XPath::Node::ElementImpl' ))}, 'XML::XPath::Node::Element' +), bless( do{\(my +$o = bless( [ + ${$VAR1}->[0]->[0]->[0], + 2, + 110, + ' ' + ], 'XML::XPath::Node::TextImpl' ))}, 'XML::XPath::Node::Text' ), bless( do{\(my +$o = ${$VAR1}->[0]->[0])}, 'XML::XPath::Node::Element' ), bless( do{\(my +$o = bless( [ + ${$VAR1}->[0]->[0]->[0], + 4, + 195, + ' ' + ], 'XML::XPath::Node::TextImpl' ))}, 'XML::XPath::Node::Text' ) ], 'body', [] ], 'XML::XPath::Nod +e::ElementImpl' ), 3, 115, '', [ bless( do{\(my $o = bles +s( [ ${$VAR +1}->[0]->[0], 0, 120, ' ' ], 'XML: +:XPath::Node::TextImpl' ))}, 'XML::XPath::Node::Text' ), bless( do{\(my $o = ${$V +AR1}->[0])}, 'XML::XPath::Node::Element' ), bless( do{\(my $o = bles +s( [ ${$VAR +1}->[0]->[0], 2, 155, ' ' ], 'XML: +:XPath::Node::TextImpl' ))}, 'XML::XPath::Node::Text' ), bless( do{\(my $o = bles +s( [ ${$VAR +1}->[0]->[0], 3, 160, '', [ bles +s( do{\(my $o = bless( [ + ${${$VAR1}->[0]->[0]->[4]->[3]}, + 0, + 165, + 'paragraph two ' + ], 'XML::XPath::Node::TextImpl' ))}, 'XML::XPath::Node::T +ext' ), bles +s( do{\(my $o = bless( [ + ${${$VAR1}->[0]->[0]->[4]->[3]}, + 1, + 170, + '', + [ + bless( do{\(my $o = bless( [ + ${${${$VAR1}->[0]->[0]->[4]->[3]}-> +[4]->[1]}, + 0, + 180, + 'another' + ], 'XML::XPath::Node::TextImpl' ))}, +'XML::XPath::Node::Text' ) + ], + 'a', + [ + bless( do{\(my $o = bless( [ + ${${${$VAR1}->[0]->[0]->[4]->[3]}-> +[4]->[1]}, + 0, + 175, + '', + 'href', + 'https://example.com/two/three.html +' + ], 'XML::XPath::Node::AttributeImpl' +))}, 'XML::XPath::Node::Attribute' ) + ] + ], 'XML::XPath::Node::ElementImpl' ))}, 'XML::XPath::Node +::Element' ), bles +s( do{\(my $o = bless( [ + ${${$VAR1}->[0]->[0]->[4]->[3]}, + 2, + 185, + ' example.' + ], 'XML::XPath::Node::TextImpl' ))}, 'XML::XPath::Node::T +ext' ) ], 'p', [] ], 'XML: +:XPath::Node::ElementImpl' ))}, 'XML::XPath::Node::Element' ), bless( do{\(my $o = bles +s( [ ${$VAR +1}->[0]->[0], 4, 190, ' ' ], 'XML: +:XPath::Node::TextImpl' ))}, 'XML::XPath::Node::Text' ) ], 'div', [] ], 'XML::XPath::Node::Elemen +tImpl' ), 1, 125, '', [ bless( do{\(my $o = bless( [ ${$VAR1}->[0], 0, 130, 'paragraph one ' ], 'XML::XPath::N +ode::TextImpl' ))}, 'XML::XPath::Node::Text' ), $VAR1, bless( do{\(my $o = bless( [ ${$VAR1}->[0], 2, 150, ' example.' ], 'XML::XPath::N +ode::TextImpl' ))}, 'XML::XPath::Node::Text' ) ], 'p', [] ], 'XML::XPath::Node::ElementImpl' ), 1, 135, '', [ bless( do{\(my $o = bless( [ ${$VAR1}, 0, 145, 'one' ], 'XML::XPath::Node::Text +Impl' ))}, 'XML::XPath::Node::Text' ) ], 'a', [ bless( do{\(my $o = bless( [ ${$VAR1}, 0, 140, '', 'href', 'https://example.com/one +/two.html' ], 'XML::XPath::Node::Attr +ibuteImpl' ))}, 'XML::XPath::Node::Attribute' ) ], [] ], 'XML::XPath::Node::ElementImpl' ))}, 'XML:: +XPath::Node::Element' ); FOUND <a href="https://example.com/two/three.html">another</a> $VAR1 = bless( do{\(my $o = bless( [ bless( [ bless( [ bless( [ bless( [ bless( [ + undef, + undef, + 5, + undef, + [ + bless( do{\(my $o = ${$VAR1}->[0]->[0]->[0]->[0])}, 'XML::XPath::N +ode::Element' ) + ], + undef, + [], + [ + bless( do{\(my $o = bless( [ + ${$VAR1}->[0]->[0]->[0]->[0]->[0], + 0, + 10, + 'xml', + 'http://www.w3.org/XML/1998/namespace' + ], 'XML::XPath::Node::NamespaceImpl' ))}, 'XML::XP +ath::Node::Namespace' ) + ] ] +, 'XML::XPath::Node::ElementImpl' ), 0, 15, '', [ bless( + do{\(my $o = bless( [ + ${$VAR1}->[0]->[0]->[0]->[0], + 0, + 35, + ' ' + ], 'XML::XPath::Node::TextImpl' ))}, 'XML::XPath::Node::Tex +t' ), bless( + do{\(my $o = bless( [ + ${$VAR1}->[0]->[0]->[0]->[0], + 1, + 40, + '', + [ + bless( do{\(my $o = bless( [ + ${${$VAR1}->[0]->[0]->[0]->[0]->[4]-> +[1]}, + 0, + 45, + ' ' + ], 'XML::XPath::Node::TextImpl' ))}, 'X +ML::XPath::Node::Text' ), + bless( do{\(my $o = bless( [ + ${${$VAR1}->[0]->[0]->[0]->[0]->[4]-> +[1]}, + 1, + 50, + '', + [], + 'meta', + [ + bless( do{\(my $o = bless( [ + ${${${$VAR1}->[0] +->[0]->[0]->[0]->[4]->[1]}->[4]->[1]}, + 0, + 55, + '', + 'name', + 'generator' + ], 'XML::XPath::Nod +e::AttributeImpl' ))}, 'XML::XPath::Node::Attribute' ), + bless( do{\(my $o = bless( [ + ${${${$VAR1}->[0] +->[0]->[0]->[0]->[4]->[1]}->[4]->[1]}, + 1, + 60, + '', + 'content', + 'HTML Tidy for HT +ML5 for Linux version 5.6.0' + ], 'XML::XPath::Nod +e::AttributeImpl' ))}, 'XML::XPath::Node::Attribute' ) + ] + ], 'XML::XPath::Node::ElementImpl' ))}, + 'XML::XPath::Node::Element' ), + bless( do{\(my $o = bless( [ + ${${$VAR1}->[0]->[0]->[0]->[0]->[4]-> +[1]}, + 2, + 65, + ' ' + ], 'XML::XPath::Node::TextImpl' ))}, 'X +ML::XPath::Node::Text' ), + bless( do{\(my $o = bless( [ + ${${$VAR1}->[0]->[0]->[0]->[0]->[4]-> +[1]}, + 3, + 70, + '', + [ + bless( do{\(my $o = bless( [ + ${${${$VAR1}->[0] +->[0]->[0]->[0]->[4]->[1]}->[4]->[3]}, + 0, + 75, + 'test' + ], 'XML::XPath::Nod +e::TextImpl' ))}, 'XML::XPath::Node::Text' ) + ], + 'title', + [] + ], 'XML::XPath::Node::ElementImpl' ))}, + 'XML::XPath::Node::Element' ), + bless( do{\(my $o = bless( [ + ${${$VAR1}->[0]->[0]->[0]->[0]->[4]-> +[1]}, + 4, + 80, + ' ' + ], 'XML::XPath::Node::TextImpl' ))}, 'X +ML::XPath::Node::Text' ) + ], + 'head', + [] + ], 'XML::XPath::Node::ElementImpl' ))}, 'XML::XPath::Node:: +Element' ), bless( + do{\(my $o = bless( [ + ${$VAR1}->[0]->[0]->[0]->[0], + 2, + 85, + ' ' + ], 'XML::XPath::Node::TextImpl' ))}, 'XML::XPath::Node::Tex +t' ), bless( + do{\(my $o = ${$VAR1}->[0]->[0]->[0])}, 'XML::XPath::Node::Element' +), bless( + do{\(my $o = bless( [ + ${$VAR1}->[0]->[0]->[0]->[0], + 4, + 200, + ' ' + ], 'XML::XPath::Node::TextImpl' ))}, 'XML::XPath::Node::Tex +t' ) ], 'html', [ bless( + do{\(my $o = bless( [ + ${$VAR1}->[0]->[0]->[0]->[0], + 0, + 20, + '', + 'class', + 'no-focus-outline no-js' + ], 'XML::XPath::Node::AttributeImpl' ))}, 'XML::XPath::Node +::Attribute' ), bless( + do{\(my $o = bless( [ + ${$VAR1}->[0]->[0]->[0]->[0], + 1, + 25, + '', + 'lang', + 'en-US' + ], 'XML::XPath::Node::AttributeImpl' ))}, 'XML::XPath::Node +::Attribute' ), bless( + do{\(my $o = bless( [ + ${$VAR1}->[0]->[0]->[0]->[0], + 2, + 30, + '', + 'data-modal-active', + 'true' + ], 'XML::XPath::Node::AttributeImpl' ))}, 'XML::XPath::Node +::Attribute' ) ] ], 'XML::X +Path::Node::ElementImpl' ), 3, 90, '', [ bless( do{\(my +$o = bless( [ + ${$VAR1}->[0]->[0]->[0], + 0, + 95, + ' ' + ], 'XML::XPath::Node::TextImpl' ))}, 'XML::XPath::Node::Text' ), bless( do{\(my +$o = bless( [ + ${$VAR1}->[0]->[0]->[0], + 1, + 100, + '', + [ + bless( do{\(my $o = bless( [ + ${${$VAR1}->[0]->[0]->[0]->[4]->[1]}, + 0, + 105, + "test\x{a0}heading" + ], 'XML::XPath::Node::TextImpl' ))}, 'XML::XPath +::Node::Text' ) + ], + 'h1', + [] + ], 'XML::XPath::Node::ElementImpl' ))}, 'XML::XPath::Node::Element' +), bless( do{\(my +$o = bless( [ + ${$VAR1}->[0]->[0]->[0], + 2, + 110, + ' ' + ], 'XML::XPath::Node::TextImpl' ))}, 'XML::XPath::Node::Text' ), bless( do{\(my +$o = ${$VAR1}->[0]->[0])}, 'XML::XPath::Node::Element' ), bless( do{\(my +$o = bless( [ + ${$VAR1}->[0]->[0]->[0], + 4, + 195, + ' ' + ], 'XML::XPath::Node::TextImpl' ))}, 'XML::XPath::Node::Text' ) ], 'body', [] ], 'XML::XPath::Nod +e::ElementImpl' ), 3, 115, '', [ bless( do{\(my $o = bles +s( [ ${$VAR +1}->[0]->[0], 0, 120, ' ' ], 'XML: +:XPath::Node::TextImpl' ))}, 'XML::XPath::Node::Text' ), bless( do{\(my $o = bles +s( [ ${$VAR +1}->[0]->[0], 1, 125, '', [ bles +s( do{\(my $o = bless( [ + ${${$VAR1}->[0]->[0]->[4]->[1]}, + 0, + 130, + 'paragraph one ' + ], 'XML::XPath::Node::TextImpl' ))}, 'XML::XPath::Node::T +ext' ), bles +s( do{\(my $o = bless( [ + ${${$VAR1}->[0]->[0]->[4]->[1]}, + 1, + 135, + '', + [ + bless( do{\(my $o = bless( [ + ${${${$VAR1}->[0]->[0]->[4]->[1]}-> +[4]->[1]}, + 0, + 145, + 'one' + ], 'XML::XPath::Node::TextImpl' ))}, +'XML::XPath::Node::Text' ) + ], + 'a', + [ + bless( do{\(my $o = bless( [ + ${${${$VAR1}->[0]->[0]->[4]->[1]}-> +[4]->[1]}, + 0, + 140, + '', + 'href', + 'https://example.com/one/two.html' + ], 'XML::XPath::Node::AttributeImpl' +))}, 'XML::XPath::Node::Attribute' ) + ], + [] + ], 'XML::XPath::Node::ElementImpl' ))}, 'XML::XPath::Node +::Element' ), bles +s( do{\(my $o = bless( [ + ${${$VAR1}->[0]->[0]->[4]->[1]}, + 2, + 150, + ' example.' + ], 'XML::XPath::Node::TextImpl' ))}, 'XML::XPath::Node::T +ext' ) ], 'p', [] ], 'XML: +:XPath::Node::ElementImpl' ))}, 'XML::XPath::Node::Element' ), bless( do{\(my $o = bles +s( [ ${$VAR +1}->[0]->[0], 2, 155, ' ' ], 'XML: +:XPath::Node::TextImpl' ))}, 'XML::XPath::Node::Text' ), bless( do{\(my $o = ${$V +AR1}->[0])}, 'XML::XPath::Node::Element' ), bless( do{\(my $o = bles +s( [ ${$VAR +1}->[0]->[0], 4, 190, ' ' ], 'XML: +:XPath::Node::TextImpl' ))}, 'XML::XPath::Node::Text' ) ], 'div', [] ], 'XML::XPath::Node::Elemen +tImpl' ), 3, 160, '', [ bless( do{\(my $o = bless( [ ${$VAR1}->[0], 0, 165, 'paragraph two ' ], 'XML::XPath::N +ode::TextImpl' ))}, 'XML::XPath::Node::Text' ), $VAR1, bless( do{\(my $o = bless( [ ${$VAR1}->[0], 2, 185, ' example.' ], 'XML::XPath::N +ode::TextImpl' ))}, 'XML::XPath::Node::Text' ) ], 'p', [] ], 'XML::XPath::Node::ElementImpl' ), 1, 170, '', [ bless( do{\(my $o = bless( [ ${$VAR1}, 0, 180, 'another' ], 'XML::XPath::Node::Text +Impl' ))}, 'XML::XPath::Node::Text' ) ], 'a', [ bless( do{\(my $o = bless( [ ${$VAR1}, 0, 175, '', 'href', 'https://example.com/two +/three.html' ], 'XML::XPath::Node::Attr +ibuteImpl' ))}, 'XML::XPath::Node::Attribute' ) ], [] ], 'XML::XPath::Node::ElementImpl' ))}, 'XML:: +XPath::Node::Element' );

Replies are listed 'Best First'.
Re: Data structure question from XML::XPath::XMLParser
by perlfan (Vicar) on Mar 28, 2021 at 20:08 UTC

      Thanks. That was it. The manual page for XML::XPath::Node::Element had most of what was needed.

      I'm still puzzled about the data structures, however.

      I'm getting XML::XPath::Node::Attribute=REF(0x55bcf3d63f30) instead of 'href' below. The manual page says it should return a list.

      #!/usr/bin/perl use HTML::Tidy; use XML::XPath; use strict; use warnings; my $body; while(my $line = <DATA>) { $body .= $line; } my $tidy = HTML::Tidy->new({output_xml => 1,numeric_entities=>1}); my $clean = $tidy->clean($body); my $parser = XML::XPath->new(xml => $clean); my $set = '//p/a'; my $nodes = $parser->find($set); foreach my $node ($nodes->get_nodelist) { print "\n"; print $node->getName(),"\n"; # this next line is wrong print join(", ", $node->getAttributeNodes),"\n"; } exit(0); __DATA__ <!doctype html> <html class="no-focus-outline no-js " lang="en-US" data-modal-active="true"> <head> <title>test</title> </head> <body> <h1>test&nbsp;heading</h1> <div> <p>paragraph one <a href="https://example.com/one/two.html">one</a> example.</p> <p>paragraph two <a href="https://example.com/two/three.html">another</a> example.</p> </div> </body> </html>
Re: Data structure question from XML::XPath::XMLParser
by marto (Cardinal) on Mar 30, 2021 at 08:26 UTC

    A short Mojo::DOM example to provide an alternative viewpoint:

    #!/usr/bin/perl use strict; use warnings; use feature 'say'; use Mojo::DOM; my $html = '<!doctype html> <html class="no-focus-outline no-js" lang="en-US" data-modal-active="true"> <head> <title>test</title> </head> <body> <h1>test&nbsp;heading</h1 <div> <p>paragraph one <a href="https://example.com/one/two.html">one</a> example.</p> <p>paragraph two <a href="https://example.com/two/three.html">another</a> example.</p> </div> </body> </html>'; my $dom = Mojo::DOM->new( $html ); foreach my $e ( $dom->find('p > a')->each ){ say $e->{'href'}; } # or $dom->find('p > a')->each(sub { say $_->{'href'} } );

    If the HTML is live you can access all of the above via Mojo::UserAgent, see this example.

      Would Mojo::DOM be able to handle the new HTML5 elements like <article>, <footer>, and <section> and so on? Their presence is what are choking the old HTML::TreeBuilder::XPath and my reason for using Tidy to convert to valid, well-formed XML.

        Yes, it does.

Re: Data structure question from XML::XPath::XMLParser
by Anonymous Monk on Mar 29, 2021 at 09:59 UTC
    Xpather learn xpaths

    my $set = '//p/a/@href';

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://11130522]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having a coffee break in the Monastery: (1)
As of 2024-04-25 00:39 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found