Using LibreOffice I whipped up the following minimal example, edited down to the minimum needed:
<?xml version="1.0"?>
<office:document-content office:version="1.2"
xmlns:office="urn:oasis:names:tc:opendocument:xmlns:office:1.0"
xmlns:text="urn:oasis:names:tc:opendocument:xmlns:text:1.0">
<office:body><office:text>
<text:p text:style-name="P1">
Fo<text:span text:style-name="T1">o<text:line-break/>
B</text:span><text:span text:style-name="T3">a</text:span>
<text:span text:style-name="T5">r<text:line-break/></text:span>
</text:p>
</office:text></office:body>
</office:document-content>
The transform needed of the XML document is something like this: <r><p>a<x>b<s/>c</x>d</p></r>
<r>
`-- <p>
`+- a
+- <x>
| `+- b
| +- <s>
| `- c
`- d
To this: <r><p>a<x>b</x></p><p><x>c</x>d</p></r>
<r>
`+- <p>
| `+- a
| `- <x>
| `-- b
`- <p>
`+- <x>
| `-- c
`- d
Where I think the elements surrounding <s/>, represented here as <x>, could even be nested more than one level, i.e. maybe <r><p>a<x>b<y>c<x>d<s/>e</x>f</y>g</x>h</p></r>
A very interesting problem, unfortunately I don't have enough time to invest at the moment. You definitely shouldn't try to do this with regexes. If the files aren't too big, I'd probably approach this with XML::LibXML...
Update 2: On second thought, a stream-based parser might be easier in this case... hmmm...
Update 1: A skeleton...
use warnings;
use strict;
use XML::LibXML;
my $dom = XML::LibXML->load_xml(string => <<'END_XML');
<?xml version="1.0"?>
<office:document-content office:version="1.2"
xmlns:office="urn:oasis:names:tc:opendocument:xmlns:office:1.0"
xmlns:text="urn:oasis:names:tc:opendocument:xmlns:text:1.0">
<office:body><office:text>
<text:p text:style-name="P1">
Fo<text:span text:style-name="T1">o<text:line-break/>
B</text:span><text:span text:style-name="T3">a</text:span>
<text:span text:style-name="T5">r<text:line-break/></text:span>
</text:p>
</office:text></office:body>
</office:document-content>
END_XML
my $xpc = XML::LibXML::XPathContext->new($dom);
$xpc->registerNs('office',
'urn:oasis:names:tc:opendocument:xmlns:office:1.0');
$xpc->registerNs('text',
'urn:oasis:names:tc:opendocument:xmlns:text:1.0');
while (1) {
my ($lb) = $xpc->findnodes('//text:p//text:line-break') or last;
die "can't handle <text:line-break> with children: $lb"
if $lb->hasChildNodes;
my ($p) = $xpc->findnodes('ancestor::text:p[1]',$lb)
or die "failed to find <text:p> ancestor of $lb";
print "$p\n"; #DB
# ... do something useful here ...
$lb->unbindNode; # so this loop terminates
}
print "#####\n";
print $dom;