Beefy Boxes and Bandwidth Generously Provided by pair Networks
Come for the quick hacks, stay for the epiphanies.

XML::Simple needs to go!

by Preceptor (Deacon)
on Dec 21, 2015 at 10:18 UTC ( [id://1150844]=perlmeditation: print w/replies, xml ) Need Help??

If there's one module that I think really needs to get thrown out of CPAN, it's this one. Why? Well, generally I'm forgiving - don't like a module; don't use it. Problem is - XML::Simple is false advertising - it commonly gets installed because it's "Simple", and that's simply not the case at all.

I see a steady stream of questions - both here and on Stack Overflow - about this module, because someone's been tripped up again by how this module tries to coerce a more complex data structure into a less complicated one.

That's really the root of the problem - just like parsing HTML with regex - the approach is fundamentally flawed.

You can use regex to grab a value out of HTML/XML. It's nasty and brittle, but sometimes a dirty hack is expedient.

And the same is true, I contend, of XML::Simple - it's the 'parsing with a regex' sort of solution. In that it can sort of work, in some scenarios but pretty fundamentally it's a bad solution to the problem at hand.

But were the module called anything else this wouldn't be a problem. There's a lot of stuff in the CPAN namespace that I've never used or installed - that's fine, that's kind of the point.

But many an unwary newbie has been caught out by it - there are a number of "Simple" modules that offer cut down interfaces for a limited subset of operations. LWP::Simple is a good example - it offers a cut down interface to the basic tasks one might need to accomplish with LWP.

This module is official discouraged in it's module doc page but it's still picked up as the "Simple" answer.

Is there precedent for renaming a module in CPAN? I know it's not really an option to just delete it, because there's probably some legacy code depending on it (as much as I think they should be rewriting it, that isn't really my call to make!). But it really does suffer from all the things that have given perl a bit of a bad name in the past - it's a sure road to some rather hacky/nasty code.

Replies are listed 'Best First'.
Re: XML::Simple needs to go!
by Corion (Patriarch) on Dec 21, 2015 at 10:32 UTC

    If you can't get people to read the module documentation, I don't think there is a way that you can prevent people from finding old versions of XML::Simple for their use.

    I think the best approach you can do is to write good articles on how to use solutions other than XML::Simple. There have been many times some solution has been recognized as bad, either by some or by many people active in a part of the Perl community. Not all these drives for excising a practice have been beneficial looking back, for example the Perl Best Practices and the Inside Out Objects technique to bring up some older examples. I don't think that it is easy to reach the people who need your advice the most, and the best thing you can do is to make it easy to do the right thing instead of making it hard(er) to do the wrong thing.

      Doing the right thing is IMO pretty easy. I tend to recommend:

      • XML::Twig is my first port of call - because it's fairly straightforward and easy to get to grips with. I think it would serve well as a 'default' module as a result. It's a bit limited in certain areas (like only handling some XPath expressions) but for most scenarios it's fine. And excellent if you need to deal with large XML thanks to being able to `purge`
      • XML::LibXML for anything else - it's a bit steeper on the learning curve (than XML::Twig, but still easier than XML::Simple), but extremely comprehensive.

      Mostly I have found that most "I am having this problem with XML::Simple..." questions are solved by "use XML::Twig instead".

Re: XML::Simple needs to go!
by toolic (Bishop) on Dec 21, 2015 at 13:47 UTC
    there's probably some legacy code depending on it
    There are 402 "Distributions Which Depend on XML-Simple", according to the Reverse dependencies link on XML::Simple. There is no way to know how many more private codebases rely on XML::Simple.

      Yes, to be expected really. I know it's not realistic that it 'go away'. Practically speaking, once things have legacy 'tails' they tend to keep them.

Re: XML::Simple needs to go!
by Mr. Muskrat (Canon) on Dec 21, 2015 at 15:16 UTC

    Rather than trying to get rid of XML::Simple, why not submit documentation patches that better explain the problems that you see here and on Stack Overflow?

    Despite all of its warts and all of the other fine modules for handling XML, XML::Simple is still my go to module when I start a new XML related task. Why? Because after using it for so many years, it still is the easiest solution for me. It's the Pareto Principle; I want to get 80% results from 20% efforts. (I still switch over to XML::Twig if needed.)

    I won't ask you to parse HTML with a regex if you don't ask me to give up a misunderstood tool from my toolbox.

      This is where you've lost me. It may be I am misunderstanding XML::Simple - but I'm fairly sure that it's simply not possible to accomplish what XML::Simple is trying to do. XML intrinsically doesn't map to perl data structures.

      I have used XML::Simple - admittedly for the sake of troubleshooting, rather than by choice - and can think of very few positive reasons to use it (except maybe 'I have used it before, so I'm more familiar with it'). Can you give me some positive examples?

        From the documentation: "XML::Simple - An API for simple XML files" which is different than "A simple API" and is where much of the issue arises I suspect. XML::Simple is good (I understand) for configuration files where otherwise you might use .INI files or something similar.

        However XML is a bad choice for configuration files anyway. In fact it's a bad choice for most things. JSON and YAML do a much better job in many of the situations where XML has been bent to fit. XML::Simple is fine in its intended role, but that role doesn't match the advertising on the box nor the expectations derived from similar boxes.

        Premature optimization is the root of all job security

        As GrandFather has already said, XML::Simple is good at dealing with simple XML and XML configuration files. Most of the XML that I have had to deal with was configuration files. Lots of software is guilty of using XML for this (just Google it).

Re: XML::Simple needs to go!
by Your Mother (Archbishop) on Dec 21, 2015 at 14:44 UTC

    As toolic remarked, a large amount of CPAN modules depend on it and most, maybe all, of the codebases I've worked on depended on it. It is not going anywhere any time soon.

    The CPAN is not a democracy and appealing to a crowd with pitchforks and torches is no way to influence an author on the matter of his or her code's disposition.

Re: XML::Simple needs to go!
by Laurent_R (Canon) on Dec 22, 2015 at 00:03 UTC
    <Mode Troll On>

    The problem might not be with XML::Simple, but simply with XML, which no sane or even half-sane person would have ever designed, but of course a committee did, and some ignorant management people thought it would be great.

    We should really get rid of XML, and that would solve the issue of XML::Simple. ;-)

    Things like JSON, YAML or CSV are so much better for 99% of the cases. <Mode Troll Off>

      If you need to deal with XML, first, we’re very sorry.

      brian d foy

      There are no rules, there are no thumbs..
      Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.

      Perhaps true. It is used in a load of places where it's overkill. It's effectively a data structure with an extra 'dimension' at each level - a node has:

      • content
      • child elements
      • attributes

      XML is about the only way you can easily do all three at once, but ... how often is this necessary? XHTML type documents is perhaps the example I can think of - and that's a pretty big use case - but actually most uses the looser HTML spec rather than XML. (Major difference is - XML is much stricter about tags, ordering/closing/nesting). That's why I think XML is here to stay, but would generally agree with your assertion - most use cases I've seen it JSON is the better choice for data object transfer (API) and YAML for flat file (config).

      But either way - once you have things like xpath and 'directory style' navigation (parent/child/sibling) of your XML doc, it is a lot saner. Certainly more so than trying to flatten part of it's dimensionality into a less complicated data structure like the perl native ones.

        Yes, I agree with you and, of course, my post was only half serious.

        The main reason for my frustration about XML is that I relatively frequently have to fix incoming data to make it valid XML before I can process it with a state-of-the-art parser. And that is a nuisance especially in view of the fact that this incoming data is in fact simple enough not to require any of the XML heavy artillery.

        Nested hashes and arrays could be forced into all three, albeit perhaps at the expense of some awkwardness. After all, it's all structured data. Below is just a quick example and I'm not writing this node to promote writing code to support this data structure. It's certainly not impossible, though, to denote these things.:

        my %data = ( 'attributes' => { '_name' => 'mydata', 'height' => 50, 'width' => '80%', }, 'children' => [ { 'attributes' => { '_name' => 'bob', ...} ...}, { 'attributes' => { '_name' => 'tom', ...} ...}, ], 'content' => 'yadda yadda ...', );

        Whether that's something you'd want to process later is another issue. Some people like to put a lot of predetermined information about their data into their code. Others like to keep the data as self-describing as possible and keep the code very general to work with that. There are strengths and weaknesses to either approach.

Re: XML::Simple needs to go!
by Anonymous Monk on Dec 21, 2015 at 21:51 UTC

      So does XML::Twig - it even maintains backwards compatibility thanks to simplify

        You could say the same about XML::Rules and inferRulesFromExample/inferRulesFromDTD. One of those will give you a bunch of rules that'll instruct XML::Rules to produce a datastructure equivalent to a well set XML::Simple.

        Enoch was right!
        Enjoy the last years of Rome.

Re: XML::Simple needs to go!
by mr_mischief (Monsignor) on Jan 06, 2016 at 15:33 UTC

    XML::Simple would probably be better named 'XML::Basic' or 'XML::Tiny'. Perhaps even 'Acme::TinyXML' would be a better name. With the right options it does read quite a bit of valid XML into reasonable data structures and will write valid XML from data structures. It's a serializer/deserializer for XML, not really a full-blown XML tool.

    I've never used for configuration files for my own work, but I can see where it could be handy if people really like XML. I have used it in a configuration front-end to things that require their configuration files in XML and expect someone to hand-tweak that. In fact, I once had an ActionScript web app I had to deploy for a client that depended on a specific ordering of XML nodes. It was simple enough to change the ActionScript not to care about ordering, so I did that. Then I used XML::Simple in a pretty straightforward way to generate from an interactive program what my client would never have been expected to maintain by hand.

    As for YAML, the more quirks I see with that the less I despise XML for configuration. I tend to use JSON, and if the target audience is nontechnical enough not to grok JSON then they probably want a configuration front-end rather than tweaking the configuration file by hand anyway. In my current work, the JSON configuration file is often generated from other data by a configuration management system.

    If someone's trying to handle arbitrary XML with XML::Simple, then that's going to be painful for them. There are warnings specifically about that in the module's documentation.

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlmeditation [id://1150844]
Approved by Discipulus
Front-paged by Discipulus
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others sharing their wisdom with the Monastery: (2)
As of 2024-04-22 00:36 GMT
Find Nodes?
    Voting Booth?

    No recent polls found