I have some paragraph nodes I'd like to truncate starting with the <br /> element within it. Given the code below at the bottom of the post, I'd like to find a way to just have it print, what's in a paragraph but then leave out or delete everything after a break.
foo01
foo02
foo03
I could find the break itself with '//div/p/br[1]' but then how would I be able to have the script delete that and everything else following it within the parent paragraph element?
for my $d ($root->findnodes($xpath))
{
for my $dd ($d->findnodes('something'))
{
$dd->delete;
}
print $d->as_trimmed_text,qq(\n) if (defined($d->as_text));
}
Here is the script so far, with data.
#!/usr/bin/perl
use HTML::TreeBuilder::XPath;
use strict;
use warnings;
my $root = HTML::TreeBuilder::XPath->new;
$root->parse_file(\*DATA)
or die("Could not parse the data: $!\n");
$root->eof();
my $xpath = '//div/p';
for my $d ($root->findnodes($xpath))
{
print $d->as_trimmed_text,qq(\n) if (defined($d->as_text));
}
$root->delete;
exit(0);
__DATA__
<div><p>foo01<br />bar01</p></div>
<div>
<p>
foo02
<br />
bar02
</p>
</div>
<div>
<p>
foo03
<br />
bar03
<br /
baz03
</p>
</div>