Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

Mojo::DOM doesn't include marked-up text in an element's text

by Cody Fendant (Hermit)
on Apr 22, 2020 at 22:31 UTC ( [id://11115910]=perlquestion: print w/replies, xml ) Need Help??

Cody Fendant has asked for the wisdom of the Perl Monks concerning the following question:

Here's a minimal example: say I have two paragraphs:

<p> Paragraph one here. </p> <p> Paragraph <b>two</b> here. </p>

And I use Mojo::DOM to grab their text:

use Mojo::DOM; my $dom = Mojo::DOM->new('<p>Paragraph one here.</p><p>Paragraph <b>two</b> he +re.'); for my $e ( $dom->find('p')->each ) { print $e->text,$/; } ### Output: # Paragraph one here. # Paragraph here. #

How do I access that paragraph's complete text, including the text inside that second level of markup? And is this a bug or a feature?

Replies are listed 'Best First'.
Re: Mojo::DOM doesn't include marked-up text in an element's text
by choroba (Cardinal) on Apr 22, 2020 at 22:51 UTC
    It's a feature. XML::LibXML behaves similarly:
    #!/usr/bin/perl use strict; use warnings; use feature qw{ say }; use XML::LibXML; my $xml = '<r><p>Paragraph one here.</p><p>Paragraph <b>two</b> here.< +/p></r>'; my $dom = 'XML::LibXML'->load_xml(string => $xml); print $dom->findvalue('/r/p[2]'); # Same as $dom->findnodes('/r/p[2]/ +/text()') # Paragraph two here. print $dom->findnodes('/r/p[2]'); # Same as map $_->toString, $dom->f +indnodes('/r/p[2]') # <p>Paragraph <b>two</b> here.</p> print $dom->findnodes('/r/p[2]/text()'); # Paragraph here

    What do you mean by "complete text"?

    map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]

      "What do you mean by "complete text"?"

      What anonymous said below, the method Cody Fendant should have used to get the combined text for all descending nodes is all_text, rather than text, so

      print $e->text,$/;

      becomes

      print $e->all_text,$/;

      Very handy, even for one liners/ojo use.

Re: Mojo::DOM doesn't include marked-up text in an element's text (all_text)
by Anonymous Monk on Apr 23, 2020 at 02:06 UTC

    Always check assumptions against docs

    docs man docs Mojo::DOM

    all_text my $text = $dom->all_text; Extract text content from all descendant nodes of this element. text my $text = $dom->text; Extract text content from this element only (not including child elements).

      Thanks! And damn, I can't believe I didn't spot that in the documentation.

      To be fair, a simple "see all_text" in the documentation next to text would have saved me a lot of frustration!

        That's just one of the reasons why I miss Annocpan so much.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://11115910]
Approved by GrandFather
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others about the Monastery: (3)
As of 2024-04-19 01:19 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found