Actually, I *do* believe that the text "World" belongs to the b element and, therefore, not to the p element.
Sure, that's up to you. I can't speak to how other modules implemented it, but I'd refer you to the libxml2 documentation, and the Document Object Model Specification for all the "official" details.
Anyway, I described two ways you can get the text nodes of the current node. Using the XPath expression I showed is probably easiest. I can't really say more since you haven't described what it is you're trying to do with the document.
use warnings;
use strict;
use XML::LibXML;
my $doc = XML::LibXML->load_xml( string => <<'EOT' );
<html>
<head> <title>Title_Text</title> </head>
<body>
<p>paragraph_text</p>
<div>
<div>
innnermost_text
</div>
</div>
</body> </html>
EOT
for my $node ($doc->findnodes('//*')) {
print "<<<", $node->nodeName, ">>>\n";
my @texts = map { $_->data } $node->findnodes('./text()');
use Data::Dump; dd @texts; # Debug
}
__END__
<<<html>>>
(" \n ", " \n ", " ")
<<<head>>>
(" ", " ")
<<<title>>>
"Title_Text"
<<<body>>>
(" \n ", "\n ", " \n ")
<<<p>>>
"paragraph_text"
<<<div>>>
(" \n ", "\n ")
<<<div>>>
" \n innnermost_text\n "
You could also use XML::LibXML::SAX to get an event-based parser. |