Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

Replacing XML::Simple XMLout with Lib::XML

by amasidlover (Sexton)
on Feb 15, 2018 at 11:49 UTC ( [id://1209202]=perlquestion: print w/replies, xml ) Need Help??

amasidlover has asked for the wisdom of the Perl Monks concerning the following question:

As per accepted wisdom we've replaced XMLin with XML::LibXML - there are places where its made life easier and places where its made life harder but overall its been a positive change (its also quite a lot faster.)

However, we're now looking at XMLout - we currently do XMLout($some_big_hashref, options) and then put the output through XSL - and before anyone suggests JSON, Template::Toolkit etc. as alternatives; we do use those but in other parts of our system.

So I got a newly recruited developer to knock together a 'quick and dirty' test script to see a) how to do it and b) the performance impact. We started with an XML::Simple section as a benchmark:

package MyXMLSimple; use base 'XML::Simple'; sub sorted_keys { my ( $self, $name, $hashref ) = @_; return sort { ma +in::element_order($a) <=> main::element_order($b) } keys(%{$hashref}) +; }; ... SNIP ... return $parser->XMLout($xml, KeyAttr => [], RootName => 'Zymonic', NoEscape => 1, SuppressEmpty => 1 );

Then we did a LibXML version...

sub add_nodes_hash { #first time $xml will be the file, after that it will be the hash +or array ref # $is_child is a flag to see if it wants to be added as like a roo +t node, or part of a nest my $xml = shift; my $parent_element = shift; foreach my $node (sort {element_order($a) <=> element_order($b)} k +eys %{$xml}) { #next if ($node =~/\//); if (ref($xml->{$node}) eq 'HASH') { my $element = $dom->createElement( $node ); $parent_element->insertAfter($element,undef); add_nodes_hash($xml->{$node},$element); } if (ref( $xml->{$node} ) eq 'ARRAY') { foreach my $array_element (@{$xml->{$node}}) { if (ref($array_element) eq 'HASH') { my $element = $dom->createElement( $node ); my @attributes = $element->attributes(); $parent_element->insertAfter($element,undef); add_nodes_hash($array_element,$element); } elsif (!ref( $xml->{$node} ) && ($xml->{$node})) { my $element = $dom->createElement( $node ); $element->appendText($xml->{$node}); $parent_element->insertAfter($element,undef); } } } elsif (!ref( $xml->{$node} ) && ($xml->{$node})) { my $element = $dom->createElement( $node ); $element->appendText($xml->{$node}); $parent_element->insertAfter($element,undef); } } return $dom; } sub libxml_output { my $xml = shift; #create the dom object $dom = XML::LibXML::Document->new(); my $root = $dom->createElement('Zymonic'); $dom->setDocumentElement($root); add_nodes_hash($xml, $root); return $dom->toString(1); }

When we generate 3000 lines of XML (approximately) we get the following timings.

====================TIMINGS======================= LibXML time for 100 rep(s): 5409.750 ms XML Simple time for 100 rep(s): 3699.437 ms

I'll put the full test script code in a reply to this node in case it helps - but didn't want to add it all for brevity

Now before I ask my specific questions, I'll add some caveats / further info:

  • I haven't checked the code thoroughly myself and I know it doesn't do anything with attributes yet; but it looks roughly ok (including the output looking ok) and I can't see it getting faster when we add attributes
  • If we were starting from scratch there would be a strong argument for assembling the elements as we went along rather than coverting a giant hashref at the end; however, we're not starting from scratch and the 'giant hashref' approach means we can also go straight to JSON and Storable using if/elsif / polymorphism as appropriate with the same code generating the hashref - and arguably we could have added a polymorphic output module that had methods that were analogous to adding elements / nodes and that had JSON and XML sub-classes - but again that's too big a rewrite for us to realistically contemplate.

Finally, my actual question(s)... Have we made a fundamental error somewhere and there is a way of going from XMLout to something 'better' whilst retaining equivalent (or better!) performance and keeping the code nice and simple? Ideally a method that we can use a one liner to go from nested hashref to XML output.

Replies are listed 'Best First'.
Re: Replacing XML::Simple XMLout with Lib::XML
by salva (Canon) on Feb 15, 2018 at 12:30 UTC
    Take a look at XML::FromPerl, which provides an easy way to generate XML form Perl using XML::LibXML under the hood. I have no idea how fast it is, though.

      I've just had a quick look at the source and its not a million miles away from what we've done; but it does rely on a particular structure of hashref/arrayrefs that we don't have so we couldn't use it as it stands but its reassuring to know at least one other person took the same approach.

Re: Replacing XML::Simple XMLout with Lib::XML
by shmem (Chancellor) on Feb 15, 2018 at 13:58 UTC
    Have we made a fundamental error somewhere...

    I think yes. Your libxml_output() re-structures the passed $xml, then stringifies, the xmlsimple_output() just stringifies the passed $xml, generated by XMLin() beforehand. The subroutines are not comparable.

    Just measuring the performance of stringification ...

    #!/usr/bin/perl -w use strict; use XML::Simple; use XML::LibXML; use Benchmark qw(cmpthese); my $file = shift or die "usage: $0 xmlfile\n"; my $xml = XMLin( $file); my $dom = XML::LibXML->load_xml(location => $file); cmpthese ( -1, { simple => sub { XMLout($xml) }, libxml => sub { $dom->toString() }, } ); __END__ Rate simple libxml simple 29.4/s -- -96% libxml 658/s 2141% --

    ... gives a different picture. Are you working internally with $xml or $dom ?

    perl -le'print map{pack c,($-++?1:13)+ord}split//,ESEL'

      Ah, sorry I didn't make it clear - in the production code there is no file to load from - here its just a handy way of getting a similar hashref to do timings on. So we can't just compare stringification; we have to convert hashref to $dom then stringify.

      Internally in the 'real' system we have 30 odd Classes some of which with 5-10 polymorphic variants that all have a method called output which returns a nested hashref/arrayref structure that can be converted into JSON/XML/Storable etc. These are all assembled into one giant hashref before conversion.

      For the example I simply used XMLin to create a hashref that would be similar to what we already - internally (in our real framework) we're working with something that resembles $xml (which is badly named and perhaps should be $nested_hashref_containing_arrayrefs_and_hashrefs). A tiny fragment of it would look like:

      $VAR1 = { 'RelationshipPermissions' => { 'PermissionHolderType' => { 'cont +ent' => 'authenticated' }, 'ZName' => { 'content' => 'svz_ +admin_rp' }, 'DisplayName' => { 'content' => + 'Admin Permissions' }, 'IPList' => { 'content' => 'any' }, 'changeable' => { 'content' => 't +rue' }, 'readable' => { 'content' => 'tru +e' }, 'deleteable' => { 'content' => 't +rue' }, 'appendable' => { 'content' => 't +rue' }, 'secureable' => { 'content' => 't +rue' } } };

      So whilst the stringification is quicker if we already have a complete DOM object structure - we don't already have that and modifying our code to produce that would a) be a massive task and b) make it more difficult (impossible / slower) to also produce JSON / Storable output.

Re: Replacing XML::Simple XMLout with Lib::XML
by amasidlover (Sexton) on Feb 15, 2018 at 11:50 UTC

    Full code

    #!/usr/bin/perl -w use strict; use XML::Simple; use XML::LibXML; use diagnostics; use Time::HiRes qw(gettimeofday); package MyXMLSimple; use base 'XML::Simple'; sub sorted_keys { my ( $self, $name, $hashref ) = @_; return sort { ma +in::element_order($a) <=> main::element_order($b) } keys(%{$hashref}) +; }; package main; #load xmlfile using XMLin (creating hashref) my $file = $ARGV[0]; my $reps = $ARGV[1] || 1; my $xml = XMLin( $file); our $dom; our %ELEMENTORDER = ( Process => 1, Filter => 2, Field => 3, ZName => + 4); my $before = gettimeofday(); my $xso; foreach my $i (0..$reps) { $xso = xmlsimple_output($xml); } my $xso_time = sprintf("%0.3f", (gettimeofday() - $before) * 1000); print "=========================XML Simple outptut =================== +===========\n"; print $xso; $before = gettimeofday(); my $lxo; foreach my $i (0..$reps) { $lxo = libxml_output($xml); } my $lxo_time = sprintf("%0.3f", (gettimeofday() - $before) * 1000); print "=========================Lib XML outptut ====================== +=======\n="; print $lxo; print "====================TIMINGS=======================\n"; print "LibXML time for $reps rep(s): $lxo_time ms\n"; print "XML Simple time for $reps rep(s): $xso_time ms\n"; #print keys %{$xml->{Process}->[0]->{LongDescription}->{content}}; #loads (above) into an XML::LibXML object using a recursive method sub element_order { my $element_name = shift; return $ELEMENTORDER{$element_name} || 9999999; } sub add_nodes_hash { #first time $xml will be the file, after that it will be the hash +or array ref # $is_child is a flag to see if it wants to be added as like a roo +t node, or part of a nest my $xml = shift; my $parent_element = shift; foreach my $node (sort {element_order($a) <=> element_order($b)} k +eys %{$xml}) # foreach my $node (keys %{$xml}) { #next if ($node =~/\//); if (ref($xml->{$node}) eq 'HASH') { my $element = $dom->createElement( $node ); $parent_element->insertAfter($element,undef); add_nodes_hash($xml->{$node},$element); } if (ref( $xml->{$node} ) eq 'ARRAY') { foreach my $array_element (@{$xml->{$node}}) { if (ref($array_element) eq 'HASH') { my $element = $dom->createElement( $node ); my @attributes = $element->attributes(); $parent_element->insertAfter($element,undef); add_nodes_hash($array_element,$element); } elsif (!ref( $xml->{$node} ) && ($xml->{$node})) { my $element = $dom->createElement( $node ); $element->appendText($xml->{$node}); $parent_element->insertAfter($element,undef); } } } elsif (!ref( $xml->{$node} ) && ($xml->{$node})) { my $element = $dom->createElement( $node ); $element->appendText($xml->{$node}); $parent_element->insertAfter($element,undef); } } return $dom; } #use toString() to print it out #print add_nodes_hash($xml,0)->toString; sub libxml_output { my $xml = shift; #create the dom object $dom = XML::LibXML::Document->new(); my $root = $dom->createElement('Zymonic'); $dom->setDocumentElement($root); add_nodes_hash($xml, $root); return $dom->toString(1); } sub xmlsimple_output { my $xml = shift; my $parser = MyXMLSimple->new( KeyAttr => [], RootName => 'Zymonic' ); return $parser->XMLout($xml, KeyAttr => [], RootName => 'Zymonic', NoEscape => 1, SuppressEmpty => 1 ); }
      Interesting – but what does that have to do with the OPs concern? No downvotes here, obviously, but I do not see your point!

        Apologies - I am the OP... I thought it may make the question too long if I gave full source code for our script, but thought it may be useful to anyone trying to recreate the scenario - so I added it as a reply.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1209202]
Approved by Discipulus
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others sharing their wisdom with the Monastery: (3)
As of 2024-04-25 21:56 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found