Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

Re: XML Module Recommendations

by mirod (Canon)
on Jan 26, 2003 at 05:19 UTC ( [id://229937]=note: print w/replies, xml ) Need Help??


in reply to XML Module Recommendations

In terms of general resourcesyou can have a look at the Perl and XML FAQ and Kip Hampton's XML.com column.The Module Reviewson this site also include quite a few nodes about XML modules.

More specifically, I don't think there is any module that will satisfy all your criteria but lets list the main candidates amongst the tree based modules:

  • XML::Parser: available everywhere (comes standard with Activestate Perl, as it is used by PPM, but you need to install expat separately on *nix)), low-level (usually used to more build convenient modules), has a Tree Style that gives access to the whole document at once but no one seems to like it (or even use it),
  • XML::Simple: available through PPM,based on XML::Parser, can be used only for data-oriented XML (no mixed-content), loads the XML into a Perl structure,
  • XML::Twig: based on XML::Parser, no PPM available, see the FAQ for instructions about installing it on Windows, mixed event-tree mode, I like it (but I also wrote it ;--),
  • XML::DOM: based on XML::Parser, my only take on it is that the DOM is NOT appropriate for general purpose XML transformation, it gives you plenty of rope... avoid it,
  • XML::LibXML: based on libxml2, which needs to be installed, but a really nice module,which gives you SAX, DOM and XPath (the addition of XPath makes the DOM usable).

Those are the main tree-based modules, all of the SAX modules work are event-based. BTW XML::SAX::PurePerl would probably be too slow for a 10M file so it is likely that you might no be able to use a pure Perl solution.

In the end I would think that XML::Twig (surprise ;--) or XML::LibXML are the best choices, unless you can use XML::Simple. It also depends on the kind of XML you are dealing with (data or document).

Replies are listed 'Best First'.
Re: Re: XML Module Recommendations
by Anonymous Monk on Jan 26, 2003 at 06:22 UTC

    Hi, thanks for the excellent reply :)

    A few questions...

    You said DOM isn't appropriate for general purpose XML transformation - what if I'm just extracting data into a different structure, not necessarily translating it to XHTML or whatever? Also - the LibXML documentation says "This module is an interface to the gnome libxml2 DOM parser (no SAX parser support yet), and the DOM tree." So is it still acceptable in your opinion?

    One of the problems I've had in the past is extracting data from a doc with tag names that have identical names, example...

    <website> <name>Perlmonks</name> <rating>10/10</name> <people> <name>Anonymous Monk</name> </people> </website>

    How would I differentiate between the name inside the people tag and the website name? More of an XML question, but I'm also looking for a module that makes this really easy.

    Another thing I'd like to do easily: go through the XML file and pick out certain fields and compare them between multiple entries. For example, get the name and rating of each website so I can pick out everyone with a 10. This seems like it should be trivial (as it is with SQL) but the examples I've seen so far don't always seem so simple.

    Also - are there XML::Twig-liek interfaces for other languages? Thanks :)

      The DOM is still dangerous when extracting information,unless tou are very cautious. The main problem is with navigation methods, like getFirstChild: you just cannot use it without wrapping it into your own method. The first child of an element can be a lot of unexpected things: the line return after the element start tag, a comment, a processing instruction...and maybe even the next element. The addition of XPath in XML::LibXML makes it much safer by letting you do $elt->findnodes( 'people') which gives you the list of people elements child of $elt.

      As for differenciating between tags with the same name but in different contexts, XML modules will give you access to the context stack, so it will not be a problem. For example in Twig you can have handlers on website/name or on people/name, in XML::LibXML you would similarly use XPath to get the elements you want.

      In fact the XML equivalent of SQL is XPath (at least within a single document, XML Query deals with collections of documents). A nice resource for XML-related tutorials is Zvon.org, they have a good XPath tutorial.

      XML::Twig is purely perl. Note that if you don't want to use Perl you can always use XSLT, there are plenty of XSLT processors around, some of which can even be called from Perl.

      One last question, especially in light of a recent thread: it seems to me that you are dealing with data, and doing the kind of processing that a database doesvery well. So why are you using XML at all? Couldn't you just model your data into tables and use a DB? There are several portable alternatives that support the kind of processing you seem to be looking for.

        it seems to me that you are dealing with data, and doing the kind of processing that a database doesvery well. So why are you using XML at all? Couldn't you just model your data into tables and use a DB? There are several portable alternatives that support the kind of processing you seem to be looking for.

        I would, but the code needs to be run on many different systems and while I can ensure Perl is installed, I would have a lot of trouble ensuring my database of choice would be.

        Thanks for all the suggestions though, I should be able to find something appropriate now :)

      The LibXML documentation says "This module is an interface to the gnome libxml2 DOM parser (no SAX parser support yet), and the DOM tree." So is it still acceptable in your opinion?

      It doesn't matter whether anyone else finds this restriction acceptable or not. You need to determine whether this may cause problems for you or not and compare those problems to the benefits you gain from using this code. It depends on your situation.

      Any modern Unix should run libxml2 and some come with it installed or as part of their package system. If you run Windows, PPMs exist for ActivePerl. Windows binaries of libxml2 exist, if you use other versions of Perl on Windows.

      So, code that uses XML::LibXML should run on Unix, including Mac OS X, and Windows. If you need to port your code to other platforms, investigate each platform and see if XML::LibXML runs on it.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://229937]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others musing on the Monastery: (6)
As of 2024-03-29 09:46 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found