http://qs321.pair.com?node_id=737123


in reply to Modified title: The structures created by many of the XML parsers in Perl appear unnecessarily deep in levels...

Your post suggests that you don't like dealing with XML via perl data structures (hashes of hashes and arrays etc...) but you also say "navigating them even with tools like XPath make you want to perform oral surgery on yourself"

So what would your ideal API look like? If you describe how you want to go about inspecting/manipulating your data maybe people can suggest modules that accommodate.

(Or perhaps an example of XML parsing in another language that you find less "bloomin' difficult".)

Replies are listed 'Best First'.
Re^2: Why oh why is working with XML so bloomin' difficult in Perl?
by jfroebe (Parson) on Jan 19, 2009 at 04:25 UTC

    LOL! Don't forget the rusty tiddly winks. ;-) It is a difficult thing to do to work with XML in any language. XML::Simple as mentioned earlier seems to be perfect for simpler XML documents/streams. I'm not certain how effective using Rules will be with large complex documents. There is only one way to find out though... thanks everyone for all the suggestions.

    Jason L. Froebe

    Blog, Tech Blog

      The problem with XML::Simple is that unless you fiddle with ForceArray and ForceContent the resulting data structure is not consistent. If some tag sometimes has text content and attributes and sometimes only the content, you get a hash once and a scalar later. If some tag is repeated within another tag once, but occurs only once the other time, you get array of hashes/scalars the first time and one hash/scalar the second.

      If you know your data you can set the XML::Simple's options accordingly. Or you can ask XML::Rules to infer the rules from either the DTD or a (few) example(s) and obtain a consistent datastructure almost identic to the one created by a well set XML::Simple.

      How effective are Rules with large documents depends on the rules. That's what specifies whether you keep all the data from the document or whether you filter the bits you do not need as you go or process parts of the XML and forget the data you no longer need.