Neat. That's close but takes the first part of the node, and only the first part of the node, not necessarily all of the pieces preceeding a <br /> element. Consider foo2a in the following example.
#!/usr/bin/perl
use HTML::TreeBuilder::XPath;
use strict;
use warnings;
my $root = HTML::TreeBuilder::XPath->new;
$root->parse_file(\*DATA)
or die("Could not parse the data: $!\n");
$root->eof();
my $xpath = '//div/p';
for my $d ($root->findnodes($xpath))
{
my @line = $d->content_list;
s/^\s+|\s+$//g for @line;
$d->replace_with($line[0],qq(\n));
}
print $root->as_trimmed_text,qq(\n);
$root->delete;
exit(0);
__DATA__
<div><p>foo00
bar00</p></div>
<div><p>foo01<br />bar01</p></div>
<div>
<p>
foo02
<br />
bar02
</p>
</div>
<div>
<p>
<a href="foobar01">foobar02</a>
foo02a
<br />
bar02a
</p>
</div>
<div>
<p>
foo03
<br />
bar03
<br />
baz03
</p>
</div>
<div>
<p>
<em>foo04</em>
<br />
<strong>bar04</strong>
<br />
<em>baz04</em>
</p>
</div>
<div>
<p>
<em>foo05</em>
</p>
<p>bar05
<br />
<em>baz05</em>
</p>
</div>
I am trying many experiments with $d->content_list. I suppose it would be possible to extract the node as a hash and then loop through it eliminating the <br /> element and everything after it.