I know nothing of ODT and the example presented by haukex fails to open so I can't test whether I broke something, but here's a possible solution using XML::Rules.
use strict;
use XML::Rules;
my $filter = XML::Rules->new(
style => 'filter',
namespaces => {
'urn:oasis:names:tc:opendocument:xmlns:text:1.0' => 'text',
'urn:oasis:names:tc:opendocument:xmlns:office:1.0' => 'office'
},
rules => {
_default => 'raw', # we do not care what's inside the tags,
# we just want to preserve everything
'text:p' => sub { return $_[0] => $_[1] }, # this doesn't seem
+ to do anything,
# but it's necessary. The filter mode sends everything out
+side tags
# with special rules directly to output
'text:line-break' => sub {
my ($tag, $attrs, $parents, $parentAttrs, $parser) = @_;
my $idx = $#$parents; # find the <text:p> tag enclosing th
+is one
$idx-- while ($idx >=0 && $parents->[$idx] ne 'text:p');
return $tag => $attrs if ($parents->[$idx] ne 'text:p');
# line break outside paragraph, leave alone
my $level = $#$parents - $idx + 1;
print { $parser->{FH} } $parser->parentsToXML( $level);
#output the <text:p> and everything inside we read so far
print { $parser->{FH} } $parser->closeParentsToXML( $level
+);
# close the opened tags all the way to the <text:p>
print { $parser->{FH} } "\n";
foreach my $i ($idx .. $#$parents) { # remove the printed
+content
delete $parentAttrs->[$i]->{_content}; # leaves the at
+tributes intact
}
return; # remove the <text:line-break/>
}
}
);
$filter->filter( \*DATA, \*STDOUT);
__DATA__
<?xml version="1.0"?>
<office:document-content office:version="1.2"
xmlns:office="urn:oasis:names:tc:opendocument:xmlns:office:1.0"
xmlns:text="urn:oasis:names:tc:opendocument:xmlns:text:1.0">
<office:body><office:text>
<text:p text:style-name="P1">
Fo<text:span text:style-name="T1">o<text:line-break/>
B</text:span><text:span text:style-name="T3">a</text:span>
<text:span text:style-name="T5">r<text:line-break/></text:span>
</text:p>
</office:text></office:body>
</office:document-content>
The code will work correctly (provided I understood the requirements right) no matter how many tags are open within the <text:p>.
Jenda
Enoch was right!
Enjoy the last years of Rome.