I'd like to replace selected paragraphs in place but am not figuring out the right way of doing so. Specifically I'd like to replace a specific, single P with multiple P at the same point in the tree. I've tried many dozen variations of the below code, but what I have gives an error, "the target node's parent has no content!?"
I don't understand. The replace_with_content or push_content methods should have established something for postinsert to add to. Clearly I have missed something? What though?
#!/usr/bin/perl
use HTML::TreeBuilder::XPath;
use warnings;
use strict;
&readfile;
exit(0);
sub readfile {
my ($file)= (@_);
my $xhtml = HTML::TreeBuilder::XPath->new;
$xhtml->implicit_tags(1);
$xhtml->no_space_compacting(1);
$xhtml->parse_file(\*DATA) or die();
# find double-spaced paragraphs inside blockquotes and expand them
for my $p ($xhtml->findnodes('//blockquote/p')) {
my $text = $p->as_text();
$text =~ s/^\s+//; $text =~ s/\s+$//;
next unless($text =~/\n\s*\n\s*/);
my @paragraphs = split(/\s*\n\s*/, $text);
print qq(\t\@paragraphs=),join(',',@paragraphs),qq(\n);
if ($#paragraphs >= 0) {
my $pp = shift(@paragraphs);
print qq(\t\tpp1=$pp\n);
$p->replace_with_content();
$p->push_content(['p',,$pp]);
print qq(Identified :\n);
print qq(«),$p->as_XML_indented,qq(»\n);
foreach $pp (@paragraphs) {
print qq(\t\tpp2=$pp\n);
$p->postinsert(['p',,$pp]);
}
}
}
print qq(\n),qq(-)x30,qq(\n);
my ($body) = $xhtml->findnodes('//body');
print qq(\n);
print $body->as_XML_indented;
$xhtml->delete;
return (1);
}
__DATA__
<body>
<blockquote id="one">
aaa
bbb
ccc
</blockquote>
<blockquote id="two">
<p>
ddd
eee
fff
</p>
</blockquote>
<blockquote id="three">
<p>
ggg
</p> <p>
hhh
</p> <p>
iii
</p>
</blockquote>
<blockquote id="four">
<p>
jjj
</p>
</blockquote>
</body>
The expected output would be for BLOCKQUOTE number two to contain three separate paragraphs instead of one (or four). The other P in the other BLOCKQUOTE elements should continue to be left alone, as the script currently does.