Thanks for the previous.
I have a question about either HTML::TreeBuilder::XPath or HTML::Element, and the interaction between them. I would like to manipulate the content of an element while leaving all its children in place. I'm not able to find a way around that because it appears that replace_with()
also automatically and unavoidably escapes the < and > signs. The example below uses ~literal but I've also tried creating a new element. Either way, the child elements within the selected element get escaped despite my best efforts. How would it be possible to do something like the following (using a different work flow if necessary) such that the tags for the child elements remain intact and unescaped?
#!/usr/bin/perl
use HTML::TreeBuilder::XPath;
use HTML::Element;
use warnings;
use strict;
my $xhtml = HTML::TreeBuilder::XPath->new;
$xhtml->implicit_tags(1);
$xhtml->no_space_compacting(1);
$xhtml->parse_file(\*DATA)
or die("Could not parse file handle for 'DATA' : $!\n");
for my $item ($xhtml->findnodes('//div/ul/li')) {
my $li = $item->as_XML;
$li =~ s/^\s+//;
# ... omitting rest of the stuff which happens to $li ...
my $new = HTML::Element->new('~literal', 'text' => $li);
$item->replace_with($new);
}
print $xhtml->as_XML_indented;
$xhtml->delete;
exit(0);
__DATA__
<html>
<head>
<title>Foo Bar</title>
</head>
<body>
<div><a href=" http://foo.example.com/ ">Foo Bar</a>
<ul>
<li> foo foo foo
foo <em>bar</em> foo
foo foo foo foo
</li></ul></div>
<div><a href=" http://bar.example.com/ ">Bar Foo</a>
<ul>
<li> foo foo foo
foo <em>bar</em> foo
foo foo foo foo
<ul>
<li>alpha</li>
<li>b<em>et</em>a</li>
<li>gamma</li>
</ul>
</li></ul></div>
</body>
</html>
The output I get is as follows:
<html>
<head>
<title>Foo Bar</title>
</head>
<body>
<div><a href=" http://foo.example.com/ ">Foo Bar</a>
<ul><li> foo foo foo
foo <em>bar</em> foo
foo foo foo foo
</li>
</ul>
</div>
<div><a href=" http://bar.example.com/ ">Bar Foo</a>
<ul><li> foo foo foo
foo <em>bar</em> foo
foo foo foo foo
<ul><li>alpha</li><li>b<em>et</e
+m>a</li><li>gamma</li></ul></li>
</ul>
</div>
</body>
</html>
The output I would like to get instead would look like this:
<html>
<head>
<title>Foo Bar</title>
</head>
<body>
<div><a href=" http://foo.example.com/ ">Foo Bar</a>
<ul><li>foo foo foo
foo <em>bar</em> foo
foo foo foo foo
</li>
</ul>HTML::TreeBuilder::XPath
</div>
<div><a href=" http://bar.example.com/ ">Bar Foo</a>
<ul><li>foo foo foo
foo <em>bar</em> foo
foo foo foo foo
<ul><li>alpha</li><li>b<em>et</em>a</li><li>gamma</li></ul></li>
</ul>
</div>
</body>
</html>
I'm not sure if HTML::TreeBuilder::XPath can be made to work like that. If it can, what has to change?