http://qs321.pair.com?node_id=737386


in reply to Modified title: The structures created by many of the XML parsers in Perl appear unnecessarily deep in levels...

It is? Really? Because for the past 5 years or so of munging XML with Perl i have found the task to be quite easy. Maybe you need to study Perl more or flat out pick another language to program in, because Perl is fundamentally built upon hashes and arrays. It sounds like you only want us to think you love Perl.

jeffa

L-LL-L--L-LL-L--L-LL-L--
-R--R-RR-R--R-RR-R--R-RR
B--B--B--B--B--B--B--B--
H---H---H---H---H---H---
(the triplet paradiddle with high-hat)
  • Comment on Re: Why oh why is working with XML so bloomin' difficult in Perl?

Replies are listed 'Best First'.
Re^2: Why oh why is working with XML so bloomin' difficult in Perl?
by jfroebe (Parson) on Jan 19, 2009 at 20:52 UTC

    I think that after munging XML in Perl, you might have forgotten the learning curve of using XML in Perl can be quite steep. Not impossible, but it is there.

    My frustration with the situation is apparent but it wasn't meant to be a derogatory statement about Perl or about XML. I'm sorry if it was taken that way.

    As I said earlier, thanks to everyone that pointed out modules that might make handling XML a bit easier... I think the underlying fault of the problem may be the design of XML itself. It may be too flexible to satisfactorily parse and manipulate in an intuitive manner in all cases without knowing in advance the data format of the stream.

    Jason L. Froebe

    Blog, Tech Blog

      If XML is difficult for you to parse, maybe your XML isn't structured properly. I don't know your application or where your XML comes from, so that makes it difficult to judge. XML is just a way to structure data. It can be as flexible or strict as you need it to be (within your app) and shouldn't be more flexible than it needs to be. Parsing XML using DOM methods will be much more difficult than just using something like XML::Simple or the like, but that's the way many languages do it. There are a lot of modules on CPAN that try to make it easy for you. I feel that they do a very good job of that.

      Again, if your parsing seems difficult, maybe the XML that you're parsing is just a bit too flexible for your needs. And I've yet to hear what exactly is difficult about it in Perl. I hear you repeating the same thing about it being difficult, but haven't heard any examples of why you believe it to be difficult.

      On a side note, I'd recommend modifying your title to something a little less irritated, otherwise you'll get even more of these defensive replies.


      While I ask a lot of Win32 questions, I hate Windows with a passion. That's the problem with writing a cross-platform program. I'm a Linux user myself. I wish more people were.
      If you want to do evil, science provides the most powerful weapons to do evil; but equally, if you want to do good, science puts into your hands the most powerful tools to do so.
      - Richard Dawkins

        Good points. As Jeffa pointed out, it isn't just Perl that can have complex structures (hashes of hashes, etc) created from the XML Parsers. What is the frustrating part is that in several of the XML parser modules, the structures can become unnecessarily complex. Given the unknown data coming in, most of the parsers create the memory structures in a generic way. Not a bad thing, it just tends be cumbersome.

        One problem that can arise is that the information you are looking for may not be at a consistent location within the structure which would eliminate any XML Path tools. If the data is properly tagged, then you can use rules or similar based tools. These were mentioned in earlier replies so I'm not going to rehash them.

        How is this the fault of Perl? It's not. Perl, itself, knows nothing about XML. The wide variety of XML parsers within CPAN does give us a clue that the problem is really with how flexible XML can be and how some XML data sources can really make parsing it very difficult. The XML parsers do their best at parsing it but there is no best XML parser module for the majority of the XML data sources.. in some simplistic cases, such as Flickr's REST web service, XML::Simple works just fine and produces data structures that are well suited for the simple XML data. In others..

        Is this a bit more clear?

        The title is now "Modified title: The structures created by many of the XML parsers in Perl appear unnecessarily deep in levels..." which I hope is a bit less inflammatory. Otherwise, we'll have to get a bucket full of valium for some perlmonks ;-) (Just teasing)

        Jason L. Froebe

        Blog, Tech Blog

      As your difficulty seems to be the depth of the nested tree structure produced when parsing your nested tree document, perhaps your difficulty lies in the depth of nesting of the tree in your document.

        It might be fair to say that it would be part of the problem but not the whole problem. Part of the problem is selecting the correct tool (XML parser module for the job...

        Jason L. Froebe

        Blog, Tech Blog

      No. I think that you assume that Perl is hard when it is not. The learning curve for Perl is no more steep than those for Java, C#, Python ... etc. One must first learn how to use data structures in a general sense, and that is the steep learning curve right there. Don't blame the language when the concept itself is what is hard to learn. Likewise, don't fault all XML because you have not bothered to study how the specific XML that you are trying to parse was put together. Again, i really doubt you love Perl as much as you want us to think you do.

      jeffa

      L-LL-L--L-LL-L--L-LL-L--
      -R--R-RR-R--R-RR-R--R-RR
      B--B--B--B--B--B--B--B--
      H---H---H---H---H---H---
      (the triplet paradiddle with high-hat)
      
Re^2: Why oh why is working with XML so bloomin' difficult in Perl?
by metaperl (Curate) on Jan 20, 2009 at 16:45 UTC
    jeffa, I find it interesting that you munge XML without touching it (I assume), yet prefer inline approach to munging HTML..

    Logically, HTML is an instance of XML and can be treated the same... at least that's how us users of HTML::Seamstress see it.

      The only thing i prefer when munging HTML is meeting my clients' needs while working in the framework that they have. The ideals proposed by your module are noble indeed, but then the real world rears its ugly head and shatters those notions. Build it, and they will change it. My issues, however, are not with what Perl solutions to use, but rather how easy it is for someone post what the OP did and get away with it. Clever trolls be among us.

      jeffa

      L-LL-L--L-LL-L--L-LL-L--
      -R--R-RR-R--R-RR-R--R-RR
      B--B--B--B--B--B--B--B--
      H---H---H---H---H---H---
      (the triplet paradiddle with high-hat)