Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

XML::Simple design decisions

by grantm (Parson)
on Nov 09, 2002 at 07:26 UTC ( [id://211626]=note: print w/replies, xml ) Need Help??


in reply to XML::Simple

I know this is an old thread, but it prompted this question in the chatterbox and my response is probably a bit wordy for a chatterbox reply.

In this node it is mentioned that without forcearray the values of the hash produced by XML::Simple will produce arrayrefs in some cases and scalars in other cases... it was mentioned in the node that it did not seem to be a good design decision. What motivated that decision?

I'll start (uncharacteristically) by answering the question: simplicity was the motivation.

I needed an API that made it very easy to work with common forms of XML. For my purposes, the failing of the existing APIs was complexity. Complexity that was born from the need to provide a comprehensive solution which covered all possible cases. I felt that for the common cases, a module could 'guess' what you wanted instead of forcing you to specify in excrutiating detail. Here's a little background...

One frequently asked question in the XML world is "should I store my data in attributes or nested elements?". For example, the data content of this XML...

<person> <firstname>Bob</firstname> <surname>Smith</surname> <dob>18-Aug-1972</dob> <hobby>Fishing</hobby> </person>

... is equivalent to this XML:

<person firstname="Bob" surname="Smith" dob="18-Aug-1972" hobby="Fis +hing" />

Some people prefer the first form and some prefer the second - there is no 'right' answer as long as we assume that there will only ever be one first name, one surname, one date of birth and one hobby. If we list multiple hobbies, then they must be represented as child elements since the rules of XML say an element cannot have two attributes with the same name. So we might end up with something like this:

<person firstname="Bob" surname="Smith" dob="18-Aug-1972"> <hobby>Fishing</hobby> <hobby>Trainspotting</hobby> </person>

To some people, this hybrid form is the obvious and sensible solution. To others, it is ugly and inconsistent. I don't really take a position on that argument and neither does XML::Simple. The XML::Simple API makes it just as easy to access data from nested elements as it is from attributes. It achieves this simplicity by applying simple rules to 'guess' what you want. If you understand the rules then you can provide hints (through options) to ensure the guesses always go your way.

Now to return to our examples, this code

my $person = XMLin($filename)

Will read both the first and second XML documents (above) into a structure like this:

{ firstname => "Bob" , surname => "Smith", dob => "18-Aug-1972", hobby => "Fishing", }

and the third XML document into a structure like this:

{ firstname => "Bob" , surname => "Smith", dob => "18-Aug-1972", hobby => [ "Fishing", "Trainspotting" ] }

By default, XML::Simple always represents an element as a scalar - unless it encounters more than one of them, in which case the scalar is 'promoted' to an array. Obviously it would be a bad thing for your code to have to check whether an element was a scalar or an arrayref before processing it - so don't do that.

One approach to achieving more consistency is to use the 'forcearray' option like this:

my $person = XMLin($filename, forcearray => 1)

which will read the first XML document into a structure like this:

{ firstname => [ "Bob" ], surname => [ "Smith" ], dob => [ "18-Aug-1972" ], hobby => [ "Fishing" ], }

and the third XML document into a structure like this:

{ firstname => "Bob", surname => "Smith", dob => "18-Aug-1972", hobby => [ "Fishing", "Trainspotting" ], }

But a better alternative is to enable forcearray only for the elements which might occur multiple times (ie: influence the guessing process):

my $person = XMLin($filename, forcearray => [ 'hobby' ])

which will consistently read any of the example forms into this type of structure regardless of whether there is only one hobby:

{ firstname => "Bob", surname => "Smith", dob => "18-Aug-1972", hobby => [ "Fishing", "Trainspotting ], }

Given the three possible values for the forcearray option ...

  1. 0 (always 'guess')
  2. 1 (always represent child elements as arrayrefs - even if there's only one)
  3. a list of element names (force named elements to arrayrefs, guess for all others)

... you might well ask why I chose the first option. The truth is that I don't know. The third option is clearly the best for most people, but I couldn't use it as the default since I couldn't know in advance what elements people would want to name. The fact that I chose the worse of the two remaining options hopefully means that a few more people have read the documentation and realised option three is the one they want.

The observant reader will have noted that I said I couldn't use a list of element names as a default for the 'forcearray' option and yet that is precisely what I chose to use as the default value for the 'keyattr' option. I could quote Oscar Wilde at this point ("Consistency is the last resort of the unimaginative") but the truth is, I didn't think people would think to go looking for the 'array folding' feature so I put it somewhere where they could trip over it.

Replies are listed 'Best First'.
Re: XML::Simple design decisions
by alw (Sexton) on Dec 24, 2007 at 19:52 UTC
    I have a problem with array folding that I could only solve by subclassing XML::Simple and hardcoding a return in the array_to_hash method.
    sub array_to_hash { . . . # Or assume keyattr => [ .... ] else { ELEMENT: for($i = 0; $i < @$arrayref; $i++) { return ($arrayref) if $arrayref->[$i]{name} eq 'e_im_dev_io_entry'; + #this line was added to jump out return($arrayref) unless(UNIVERSAL::isa($arrayref->[$i], 'HASH')); . . . }
    If an attribute called "name" has the same value in multiple nested elements, then only one attribute will remain after the array folding. This example is only a part of a larger xml file. I don't want to use KeyAttr=>[] which does prevent the folding, since in other parts of the file, the array folding is desirable. I only want to prevent array folding if the attribute value is equal to "something". I have tried many options with no success. Am I missing something, or is subclassing the only way?
    <?xml version="1.0" encoding="UTF-8" standalone="no" ?> <Report address="Address" name="IM Report" productID="INTRFC-MGR01"> <Entry detail="4" name="e_im_dev_io_entry"> <Text>Device Handle: 2</Text> </Entry> <Entry detail="4" name="e_im_dev_io_entry"> <Text>Device Handle: 5</Text> </Entry> </Report> _______________________________________________ Options used were: none No good, I lost one element $VAR1 = { 'Entry' => { 'e_im_dev_io_entry' => { 'detail' => '4', 'Text' => 'Device Handle: +5' } }, 'name' => 'IM Report', 'address' => 'Address', 'productID' => 'INTRFC-MGR01' }; _______________________________________________ Options used were: KeyAttr=>[] This is what I want. $VAR1 = { 'Entry' => [ { 'detail' => '4', 'name' => 'e_im_dev_io_entry', 'Text' => 'Device Handle: 2' }, { 'detail' => '4', 'name' => 'e_im_dev_io_entry', 'Text' => 'Device Handle: 5' } ], 'name' => 'IM Report', 'address' => 'Address', 'productID' => 'INTRFC-MGR01' };
    I would like the output to be identical to the last example. I don't want to lose any elements. Again, this is just a small piece of a much larger xml document. There are lots of Entry elements so I can't use KeyAttr {...}

      You could also use XML::Rules instead of XML::Simple as it gives you more detailed control over what data structure gets generated.

      Something like:

      use XML::Rules; # at least 0.22 (for the stripspaces) # see http://www.perlmonks.org/?node_id=658971 my $parser = XML::Rules->new( rules => [ Text => 'content', Entry => 'as array', Report => 'pass', Other => sub {return delete($_[1]->{name}) => $_[1]}, ], stripspaces => 3, ); my $data = $parser->parse(\*DATA); use Data::Dumper; print Dumper($data); __DATA__ <?xml version="1.0" encoding="UTF-8" standalone="no" ?> <Report address="Address" name="IM Report" productID="INTRFC-MGR01"> <Entry detail="4" name="e_im_dev_io_entry"> <Text>Device Handle: 2</Text> </Entry> <Entry detail="4" name="e_im_dev_io_entry"> <Text>Device Handle: 5</Text> </Entry> <Other detail="4" name="first"> <Text>Device Handle: 5</Text> </Other> <Other detail="4" name="second"> <Text>Device Handle: 5</Text> </Other> </Report>

      It doesn't try to guess as XML::Simple does so it's more work though. (In not yet released 0.23 the rule for the Other tag will be just Other => 'by name',.)

        Thanks Jenda, that module works very well for me. One more point I would like to make. With XML::Simple and my xml file that I am working with, I can't use ForceArray => 'Entry' either for some reason; it hangs the script. I do not have that problem with XML::Rules.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://211626]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (8)
As of 2024-04-19 09:38 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found