Thanks for the previous.
I have a question about either HTML::TreeBuilder::XPath or HTML::Element, and the interaction between them. I would like to manipulate the content of an element while leaving all its children in place. I'm not able to find a way around that because it appears that replace_with()
also automatically and unavoidably escapes the < and > signs. The example below uses ~literal but I've also tried creating a new element. Either way, the child elements within the selected element get escaped despite my best efforts. How would it be possible to do something like the following (using a different work flow if necessary) such that the tags for the child elements remain intact and unescaped?
#!/usr/bin/perl
use HTML::TreeBuilder::XPath;
use HTML::Element;
use warnings;
use strict;
my $xhtml = HTML::TreeBuilder::XPath->new;
$xhtml->implicit_tags(1);
$xhtml->no_space_compacting(1);
$xhtml->parse_file(\*DATA)
or die("Could not parse file handle for 'DATA' : $!\n");
for my $item ($xhtml->findnodes('//div/ul/li')) {
my $li = $item->as_XML;
$li =~ s/^\s+//;
# ... omitting rest of the stuff which happens to $li ...
my $new = HTML::Element->new('~literal', 'text' => $li);
$item->replace_with($new);
}
print $xhtml->as_XML_indented;
$xhtml->delete;
exit(0);
__DATA__
<html>
<head>
<title>Foo Bar</title>
</head>
<body>
<div><a href=" http://foo.example.com/ ">Foo Bar</a>
<ul>
<li> foo foo foo
foo <em>bar</em> foo
foo foo foo foo
</li></ul></div>
<div><a href=" http://bar.example.com/ ">Bar Foo</a>
<ul>
<li> foo foo foo
foo <em>bar</em> foo
foo foo foo foo
<ul>
<li>alpha</li>
<li>b<em>et</em>a</li>
<li>gamma</li>
</ul>
</li></ul></div>
</body>
</html>
The output I get is as follows:
<html>
<head>
<title>Foo Bar</title>
</head>
<body>
<div><a href=" http://foo.example.com/ ">Foo Bar</a>
<ul><li> foo foo foo
foo <em>bar</em> foo
foo foo foo foo
</li>
</ul>
</div>
<div><a href=" http://bar.example.com/ ">Bar Foo</a>
<ul><li> foo foo foo
foo <em>bar</em> foo
foo foo foo foo
<ul><li>alpha</li><li>b<em>et</e
+m>a</li><li>gamma</li></ul></li>
</ul>
</div>
</body>
</html>
The output I would like to get instead would look like this:
<html>
<head>
<title>Foo Bar</title>
</head>
<body>
<div><a href=" http://foo.example.com/ ">Foo Bar</a>
<ul><li>foo foo foo
foo <em>bar</em> foo
foo foo foo foo
</li>
</ul>HTML::TreeBuilder::XPath
</div>
<div><a href=" http://bar.example.com/ ">Bar Foo</a>
<ul><li>foo foo foo
foo <em>bar</em> foo
foo foo foo foo
<ul><li>alpha</li><li>b<em>et</em>a</li><li>gamma</li></ul></li>
</ul>
</div>
</body>
</html>
I'm not sure if HTML::TreeBuilder::XPath can be made to work like that. If it can, what has to change?
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.