It depends whether you're giving or taking.
If you are taking at the output from one single program emitting its own brew of XML, you will usually find that it is always emitted in exactly the same way, often pretty-printed with indented nested elements, or hard wrapped against column zero all the way down.
It is extremely rare (in my experience) to encounter XML emitted by a program that is neatly word-wrapped at or before column 72. After all, that takes a lot more work, and most sane programmers have better things to do with their time. Once you figure out empirically how a given program emits its XML, you can count on it being invariant.
So, as much as it may shock the purists, you can quite easily get away with picking out what you want from a big XML file with a regexp or two, especially if you don't have to worry about context. By that I mean, for example, extracting the contents of element <HG>, if the parent is <BAR> except when the grand parent is <ZONK>
You just need a good test-suite to cover your a.. code, to ensure that things don't break when the source program is upgraded.
You cannot adopt this approach when it is you who has written the XML specification and you're dealing with how people give you their information according to your spec. Everyone will do it differently and you will indeed have to parse it. Update: or you're taking the information from a web service and thus don't have any control or forewarning when the originating program may be upgraded.
That's been my rule so far in dealing with SGML and XML for over 15 years and it has served me well so far.
• another intruder with the mooring in the heart of the Perl
|