http://qs321.pair.com?node_id=737110

jfroebe has asked for the wisdom of the Perl Monks concerning the following question:

I love perl, I really do. The problem is that I need to also work with XML as well. CPAN has numerous modules that work with XML to one degree or another.

Producing XML is simple and easy. Reading or manipulating it involves loading the XML through a parser. The parsers will create a myriad of hashes, arrays and the like but navigating them even with tools like XPath make you want to perform oral surgery on yourself using rusty tiddly winks.

So my question is this: Are there any XML parsers for Perl that are easy to use without having to deal with arbitrary hashes of hashes of hashes and .... (don't forget the arrays)?

UPDATE:My intention in my original question was that when you look at many of the XML parsing modules, they create hashes of XXXX where the depth tends to become rather unwieldly. XML::Simple is good for simple XML but not not so much for complex XML. On the flip side, libXML is a bit much for simple XML.

The thing that started all this? Flickr::API using the abandoned XML::Parser::Lite::Tree module. Granted Flickr's REST interface is simple ... it just happened to be the last straw.

Let me formally apologize to anyone that took offense

Jason L. Froebe

Blog, Tech Blog

  • Comment on Modified title: The structures created by many of the XML parsers in Perl appear unnecessarily deep in levels...

Replies are listed 'Best First'.
Re: Why oh why is working with XML so bloomin' difficult in Perl?
by ikegami (Patriarch) on Jan 18, 2009 at 04:12 UTC

    How would you expect arbitrary one-to-many hierarchical data to look like?

    But if you simply want to manipulate XML, XML::Twig provided a simple solution for me.

Re: Why oh why is working with XML so bloomin' difficult in Perl?
by Your Mother (Archbishop) on Jan 18, 2009 at 07:46 UTC

    Just say no. It's XML. Treat it so. Use XML::LibXML. It's got a learning curve but it repays in other areas like JS and the DOM. It's worth it to go native.

      Standardized ne Native. Using a language agnostic standard can help if you work with several languages, but it tends to come for a price. The standard can only use the common parts of the languages. Which means sometimes it feels unnatural and sometimes it's more complicated than it could be. "go native" is just hype. Go with the standard if it makes sense to you, use a (language) specific solution if it makes better sense.

        I think we have different ideas of what "hype" means. :) I resisted going to libxml for a long time and I wish someone had told me several years ago how much more productive it would make me if I'd just break down and learn its interface. My enthusiasm for the kit it not hype. It's enthusiasm for the kit.

        I think Twig is nice too, so nothing against it, but XML::Simple and similar packages floating about are a dead end hidden by several blind corners and only the most trivial, static or non-production, cases should go there. The knowledge and skill gained from wrangling it is largely wasted as it transfers nowhere other than drafting for spaghetti factories.

        Update: Oh, I neglected to consider XML::Rules, etc, which is where I think we have a disconnect since I believe you wrote it. I haven't had a chance to use it and have no opinion. I certainly wasn't comparing it to XML::Simple above. It might be great and certainly worth looking at for the OP if starting from scratch on a particular problem.

      I agree here, but I'll add that the time spent learning XML::LibXML will easily made up at run time. XML::Twig is great if you need a pure Perl module, but XML::LibXML will produces results much quicker during run time and isn't as memory heavy.

        XML::Twig is great if you need a pure Perl module, but XML::LibXML will produces results much quicker

        XML::Twig is no more Pure Perl than XML::LibXML. Both use C libraries to perform the parsing. It's possible that XML::LibXML is faster, but it's not related to the choice of programming language.

        XML::LibXML [...] isn't as memory heavy.

        That's an odd thing to say. The whole idea of XML::Twig is to keep nothing in memory that doesn't need to be, whereas XML::LibXML keeps the whole document in memory.

        XML::Twig is great if you need a pure Perl module, but XML::LibXML will produces results much quicker

        XML::Twig is no more Pure Perl than XML::LibXML. Both use C libraries to perform the parsing. It's possible that's XML::LibXML is faster, but it's got nothing to do with language selection.

        XML::LibXML [...] isn't as memory heavy.

        The whole idea of XML::Twig is to keep nothing in memory that doesn't need to be, whereas XML::LibXML keeps the whole document in memory.

Re: Why oh why is working with XML so bloomin' difficult in Perl?
by hossman (Prior) on Jan 18, 2009 at 07:37 UTC

    Your post suggests that you don't like dealing with XML via perl data structures (hashes of hashes and arrays etc...) but you also say "navigating them even with tools like XPath make you want to perform oral surgery on yourself"

    So what would your ideal API look like? If you describe how you want to go about inspecting/manipulating your data maybe people can suggest modules that accommodate.

    (Or perhaps an example of XML parsing in another language that you find less "bloomin' difficult".)

      LOL! Don't forget the rusty tiddly winks. ;-) It is a difficult thing to do to work with XML in any language. XML::Simple as mentioned earlier seems to be perfect for simpler XML documents/streams. I'm not certain how effective using Rules will be with large complex documents. There is only one way to find out though... thanks everyone for all the suggestions.

      Jason L. Froebe

      Blog, Tech Blog

        The problem with XML::Simple is that unless you fiddle with ForceArray and ForceContent the resulting data structure is not consistent. If some tag sometimes has text content and attributes and sometimes only the content, you get a hash once and a scalar later. If some tag is repeated within another tag once, but occurs only once the other time, you get array of hashes/scalars the first time and one hash/scalar the second.

        If you know your data you can set the XML::Simple's options accordingly. Or you can ask XML::Rules to infer the rules from either the DTD or a (few) example(s) and obtain a consistent datastructure almost identic to the one created by a well set XML::Simple.

        How effective are Rules with large documents depends on the rules. That's what specifies whether you keep all the data from the document or whether you filter the bits you do not need as you go or process parts of the XML and forget the data you no longer need.

Re: Why oh why is working with XML so bloomin' difficult in Perl? (Diver)
by tye (Sage) on Jan 18, 2009 at 05:50 UTC

    I'm not sure, but Data::Diver might be somewhat helpful for you in ad-hoc navigation of the nested structures that typically result from parsing XML.

    - tye        

Re: Why oh why is working with XML so bloomin' difficult in Perl?
by Jenda (Abbot) on Jan 18, 2009 at 14:28 UTC
Re: Why oh why is working with XML so bloomin' difficult in Perl?
by gube (Parson) on Jan 18, 2009 at 14:41 UTC
    Hi,

    You can use XML::Simple as well. XML::Simple->new( KeepRoot => 1, KeyAttr => 1, ForceArray => 1 ); This will gives everything array of hashes.

      If I didn't know my data input and wanted a simple way to get at it all without ref($var), I'd probably go this route. Then again, if you don't know your data input, you've got quite a bit of work cut out for you.


      If you want to do evil, science provides the most powerful weapons to do evil; but equally, if you want to do good, science puts into your hands the most powerful tools to do so.
      - Richard Dawkins
Re: Why oh why is working with XML so bloomin' difficult in Perl?
by jeffa (Bishop) on Jan 19, 2009 at 20:00 UTC

    It is? Really? Because for the past 5 years or so of munging XML with Perl i have found the task to be quite easy. Maybe you need to study Perl more or flat out pick another language to program in, because Perl is fundamentally built upon hashes and arrays. It sounds like you only want us to think you love Perl.

    jeffa

    L-LL-L--L-LL-L--L-LL-L--
    -R--R-RR-R--R-RR-R--R-RR
    B--B--B--B--B--B--B--B--
    H---H---H---H---H---H---
    (the triplet paradiddle with high-hat)
    

      I think that after munging XML in Perl, you might have forgotten the learning curve of using XML in Perl can be quite steep. Not impossible, but it is there.

      My frustration with the situation is apparent but it wasn't meant to be a derogatory statement about Perl or about XML. I'm sorry if it was taken that way.

      As I said earlier, thanks to everyone that pointed out modules that might make handling XML a bit easier... I think the underlying fault of the problem may be the design of XML itself. It may be too flexible to satisfactorily parse and manipulate in an intuitive manner in all cases without knowing in advance the data format of the stream.

      Jason L. Froebe

      Blog, Tech Blog

        If XML is difficult for you to parse, maybe your XML isn't structured properly. I don't know your application or where your XML comes from, so that makes it difficult to judge. XML is just a way to structure data. It can be as flexible or strict as you need it to be (within your app) and shouldn't be more flexible than it needs to be. Parsing XML using DOM methods will be much more difficult than just using something like XML::Simple or the like, but that's the way many languages do it. There are a lot of modules on CPAN that try to make it easy for you. I feel that they do a very good job of that.

        Again, if your parsing seems difficult, maybe the XML that you're parsing is just a bit too flexible for your needs. And I've yet to hear what exactly is difficult about it in Perl. I hear you repeating the same thing about it being difficult, but haven't heard any examples of why you believe it to be difficult.

        On a side note, I'd recommend modifying your title to something a little less irritated, otherwise you'll get even more of these defensive replies.


        While I ask a lot of Win32 questions, I hate Windows with a passion. That's the problem with writing a cross-platform program. I'm a Linux user myself. I wish more people were.
        If you want to do evil, science provides the most powerful weapons to do evil; but equally, if you want to do good, science puts into your hands the most powerful tools to do so.
        - Richard Dawkins
        As your difficulty seems to be the depth of the nested tree structure produced when parsing your nested tree document, perhaps your difficulty lies in the depth of nesting of the tree in your document.

        No. I think that you assume that Perl is hard when it is not. The learning curve for Perl is no more steep than those for Java, C#, Python ... etc. One must first learn how to use data structures in a general sense, and that is the steep learning curve right there. Don't blame the language when the concept itself is what is hard to learn. Likewise, don't fault all XML because you have not bothered to study how the specific XML that you are trying to parse was put together. Again, i really doubt you love Perl as much as you want us to think you do.

        jeffa

        L-LL-L--L-LL-L--L-LL-L--
        -R--R-RR-R--R-RR-R--R-RR
        B--B--B--B--B--B--B--B--
        H---H---H---H---H---H---
        (the triplet paradiddle with high-hat)
        
      jeffa, I find it interesting that you munge XML without touching it (I assume), yet prefer inline approach to munging HTML..

      Logically, HTML is an instance of XML and can be treated the same... at least that's how us users of HTML::Seamstress see it.

        The only thing i prefer when munging HTML is meeting my clients' needs while working in the framework that they have. The ideals proposed by your module are noble indeed, but then the real world rears its ugly head and shatters those notions. Build it, and they will change it. My issues, however, are not with what Perl solutions to use, but rather how easy it is for someone post what the OP did and get away with it. Clever trolls be among us.

        jeffa

        L-LL-L--L-LL-L--L-LL-L--
        -R--R-RR-R--R-RR-R--R-RR
        B--B--B--B--B--B--B--B--
        H---H---H---H---H---H---
        (the triplet paradiddle with high-hat)