As per accepted wisdom we've replaced XMLin with XML::LibXML - there are places where its made life easier and places where its made life harder but overall its been a positive change (its also quite a lot faster.)
However, we're now looking at XMLout - we currently do XMLout($some_big_hashref, options) and then put the output through XSL - and before anyone suggests JSON, Template::Toolkit etc. as alternatives; we do use those but in other parts of our system.
So I got a newly recruited developer to knock together a 'quick and dirty' test script to see a) how to do it and b) the performance impact. We started with an XML::Simple section as a benchmark:
package MyXMLSimple;
use base 'XML::Simple';
sub sorted_keys { my ( $self, $name, $hashref ) = @_; return sort { ma
+in::element_order($a) <=> main::element_order($b) } keys(%{$hashref})
+; };
... SNIP ...
return $parser->XMLout($xml,
KeyAttr => [],
RootName => 'Zymonic',
NoEscape => 1,
SuppressEmpty => 1
);
Then we did a LibXML version...
sub add_nodes_hash
{
#first time $xml will be the file, after that it will be the hash
+or array ref
# $is_child is a flag to see if it wants to be added as like a roo
+t node, or part of a nest
my $xml = shift;
my $parent_element = shift;
foreach my $node (sort {element_order($a) <=> element_order($b)} k
+eys %{$xml})
{
#next if ($node =~/\//);
if (ref($xml->{$node}) eq 'HASH')
{
my $element = $dom->createElement( $node );
$parent_element->insertAfter($element,undef);
add_nodes_hash($xml->{$node},$element);
}
if (ref( $xml->{$node} ) eq 'ARRAY')
{
foreach my $array_element (@{$xml->{$node}})
{
if (ref($array_element) eq 'HASH')
{
my $element = $dom->createElement( $node );
my @attributes = $element->attributes();
$parent_element->insertAfter($element,undef);
add_nodes_hash($array_element,$element);
}
elsif (!ref( $xml->{$node} ) && ($xml->{$node}))
{
my $element = $dom->createElement( $node );
$element->appendText($xml->{$node});
$parent_element->insertAfter($element,undef);
}
}
}
elsif (!ref( $xml->{$node} ) && ($xml->{$node}))
{
my $element = $dom->createElement( $node );
$element->appendText($xml->{$node});
$parent_element->insertAfter($element,undef);
}
}
return $dom;
}
sub libxml_output {
my $xml = shift;
#create the dom object
$dom = XML::LibXML::Document->new();
my $root = $dom->createElement('Zymonic');
$dom->setDocumentElement($root);
add_nodes_hash($xml, $root);
return $dom->toString(1);
}
When we generate 3000 lines of XML (approximately) we get the following timings.
====================TIMINGS=======================
LibXML time for 100 rep(s): 5409.750 ms
XML Simple time for 100 rep(s): 3699.437 ms
I'll put the full test script code in a reply to this node in case it helps - but didn't want to add it all for brevity
Now before I ask my specific questions, I'll add some caveats / further info:
- I haven't checked the code thoroughly myself and I know it doesn't do anything with attributes yet; but it looks roughly ok (including the output looking ok) and I can't see it getting faster when we add attributes
- If we were starting from scratch there would be a strong argument for assembling the elements as we went along rather than coverting a giant hashref at the end; however, we're not starting from scratch and the 'giant hashref' approach means we can also go straight to JSON and Storable using if/elsif / polymorphism as appropriate with the same code generating the hashref - and arguably we could have added a polymorphic output module that had methods that were analogous to adding elements / nodes and that had JSON and XML sub-classes - but again that's too big a rewrite for us to realistically contemplate.
Finally, my actual question(s)... Have we made a fundamental error somewhere and there is a way of going from XMLout to something 'better' whilst retaining equivalent (or better!) performance and keeping the code nice and simple? Ideally a method that we can use a one liner to go from nested hashref to XML output.