Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??

Thanks for the previous.

I have a question about either HTML::TreeBuilder::XPath or HTML::Element, and the interaction between them. I would like to manipulate the content of an element while leaving all its children in place. I'm not able to find a way around that because it appears that replace_with() also automatically and unavoidably escapes the < and > signs. The example below uses ~literal but I've also tried creating a new element. Either way, the child elements within the selected element get escaped despite my best efforts. How would it be possible to do something like the following (using a different work flow if necessary) such that the tags for the child elements remain intact and unescaped?

#!/usr/bin/perl use HTML::TreeBuilder::XPath; use HTML::Element; use warnings; use strict; my $xhtml = HTML::TreeBuilder::XPath->new; $xhtml->implicit_tags(1); $xhtml->no_space_compacting(1); $xhtml->parse_file(\*DATA) or die("Could not parse file handle for 'DATA' : $!\n"); for my $item ($xhtml->findnodes('//div/ul/li')) { my $li = $item->as_XML; $li =~ s/^\s+//; # ... omitting rest of the stuff which happens to $li ... my $new = HTML::Element->new('~literal', 'text' => $li); $item->replace_with($new); } print $xhtml->as_XML_indented; $xhtml->delete; exit(0); __DATA__ <html> <head> <title>Foo Bar</title> </head> <body> <div><a href=" http://foo.example.com/ ">Foo Bar</a> <ul> <li> foo foo foo foo <em>bar</em> foo foo foo foo foo </li></ul></div> <div><a href=" http://bar.example.com/ ">Bar Foo</a> <ul> <li> foo foo foo foo <em>bar</em> foo foo foo foo foo <ul> <li>alpha</li> <li>b<em>et</em>a</li> <li>gamma</li> </ul> </li></ul></div> </body> </html>

The output I get is as follows:

<html> <head> <title>Foo Bar</title> </head> <body> <div><a href=" http://foo.example.com/ ">Foo Bar</a> <ul>&lt;li&gt; foo foo foo foo &lt;em&gt;bar&lt;/em&gt; foo foo foo foo foo &lt;/li&gt; </ul> </div> <div><a href=" http://bar.example.com/ ">Bar Foo</a> <ul>&lt;li&gt; foo foo foo foo &lt;em&gt;bar&lt;/em&gt; foo foo foo foo foo &lt;ul&gt;&lt;li&gt;alpha&lt;/li&gt;&lt;li&gt;b&lt;em&gt;et&lt;/e +m&gt;a&lt;/li&gt;&lt;li&gt;gamma&lt;/li&gt;&lt;/ul&gt;&lt;/li&gt; </ul> </div> </body> </html>

The output I would like to get instead would look like this:

<html> <head> <title>Foo Bar</title> </head> <body> <div><a href=" http://foo.example.com/ ">Foo Bar</a> <ul><li>foo foo foo foo <em>bar</em> foo foo foo foo foo </li> </ul>HTML::TreeBuilder::XPath </div> <div><a href=" http://bar.example.com/ ">Bar Foo</a> <ul><li>foo foo foo foo <em>bar</em> foo foo foo foo foo <ul><li>alpha</li><li>b<em>et</em>a</li><li>gamma</li></ul></li> </ul> </div> </body> </html>

I'm not sure if HTML::TreeBuilder::XPath can be made to work like that. If it can, what has to change?


In reply to Avoiding escaped child elements with HTML::TreeBuilder::XPath or HTML::Element by mldvx4

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (5)
As of 2024-04-26 06:51 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found