Perl & XML

Item Description: (see title)

Review Synopsis: A Road Map to Processing XML with Perl

Perl & XML is hard book to categorize - it is not a beginner's book and it is not a cookbook. I instead found it to be a nice road map to the many XML processing CPAN modules available to Perl programmers. (And i also found it to be a nice departure to the many XML books available that are only for Java). This is YAGOB (Yet Another Good O'Reilly Book), nice typesetting, thorough explanations, and a quirky animal cover. The index is decent, but i was a bit disheartened to find that XML::Twig was not included in the index. It is, however, covered in chapter 8.

The first chapter is the obligatory introduction, it introduces XML::Simple and discusses 'XML Gotchas'. Chapter two provides a very nice overview of XML in general. It provides the necessary base for XML newbies while providing a decent reference to refer to while working through the rest of the book. It also gives an example of an XSLT transformation - converting an XML document to an XHTML document without the help of Perl.

The fun starts with chapter three where actual XML processing is discussed and demonstrated. The CPAN modules XML::Parser, XML::LibXML, XML::XPath, and XML::Writer are given brief introductions with sample code. Also included is a demonstration of the wrong way to write an XML Parser (a well-formedness checker by hand) and the right way (by using XML::Parser). Document validation and DTD's are introduced with XML::LibXML code as a demonstration, and finally, Unicode encodings are compared and contrasted.

Chapters four and six cover event-based and tree-based parsing respectively. Chapter four goes into more detail with XML::Parser and discusses 'repackaging' XML as PYX via XML::PYX. Chapter six discusses XML::Parser yet again (along with XML::Simple) and introduces XML::SimpleObject, XML::TreeBuilder, and XML::Grove. Each module covered is given a good overview and sample code to help demonstrate.

Chapters five and seven cover the SAX and DOM modules respectively. (I recommend reading chapters four and six before covering five and seven.) An example of converting Excel spreadsheets to XML via XML::SAXDriver::Excel is covered in chapter five as well as SAX2 and installing your own XML::SAX parsers via the h2xs utility. The majority of chapter seven is a DOM class interface reference. There are two examples in this chapter, one that processes an XHTML document with XML::DOM and one that works with DOM2 and namespaces via XML::LibXML.

Chapter eight discusses how to make tree-based parsing faster and more efficient via a hand-rolled DOM iterator module (named XML::DOMIterator) that is used in conjunction with XML::DOM, and also revisits the 'node hunter' module XML::XPath. Also included is mirod's XML::Twig which is used in three examples, one of which shows how tree-based parsing can be optimized by only parsing the smallest part of the tree that needs to be parsed. XSLT is also given a more thorough discussion than the overview given in chapter one, including how it can be used in conjunction with Perl via XML::LibXSLT.

Chapters nine and ten round the book off with application examples. Chapter nine covers RSS with XML::RSS and briefly discusses XML::Generator::DBI (~~but makes no mention of DBIx::XML_RDB~~ - see mirod's comment below). It also briefly discusses the controversial SOAP::Lite. Chapter ten provides an application that subclasses an XML parser to provide an API via CGI for manipulating an XML document. Also included is a mod_perl application for converting DocBook files into HTML on the fly, as well as a discussion, solution, and work-around involving the pitfalls of using the Expat library in mod_perl.

The only cons i found were a few typos dealing with the ampersand character. Sometimes you will find & when the authors meant & and vice versa. Any seasoned Perl programmer will immediately spot these typos, but some beginners might not. Another con is that the authors discuss XML::Writer, but fail to use it in many examples that write XML. They instead do so by hand, which contradicts using CPAN modules in the first place. Again, a seasoned Perl programmer will know better. The last con is a minor nit-pick: a lot of the code seemed somewhat Java-like to me. However, these cons weigh considerably less than the pros. Again, i recommend this book to any seasoned Perl programmer that has not yet entered the realm of XML processing.

Overall i feel this is an excellent book for intermediate to advanced Perl programmers with little or no knowledge of XML processing. Tired of only knowing how to use XML::Simple? This book will show you the alternatives!

Other Reviews:

mirod's use Perl Review
O'reilly Reader Reviews
as well as reviews from your favorite online book vendor

Comment on Perl & XML

Replies are listed 'Best First'.
Re: Perl & XML by mirod (Canon) on Jun 17, 2002 at 14:45 UTC
<bockquote>Chapter nine covers RSS with XML::RSS and briefly discusses XML::Generator::DBI (but makes no mention of DBIx::XML_RDB Just a precision here: XML::Generator::DBI is meant as a replacement for DBIx::XML_RDB (see the README), so it should not be mentionned any more.	[reply]
Re: Perl & XML by bronto (Priest) on Feb 10, 2003 at 17:52 UTC
I was gifted this book and I am reading it. I am about half the way and I'd like to note down my first impressions here. It's surely a good book, even with the typos and contraddictions that jeffa pointed out. But I found one more contraddiction that I would like to stress. There are points where the authors get a bit pedantic, trying to clearify things that are already clear. Let's see an example: in chapter 5, "SAX", section "External Entity Resolution" the authors give an example of a book written in XML that was split in four files; these files were pulled in an XML file via four external entities. Since a filter that was showed before resolved this kind of entity, filtering this file would result in the whole book as output. They say: The previous filter example would resolve the external entity references for you diligently and output the entire book in one piece. Your file separation scheme would be lost and you'd have to edit the resulting file to break it back into multiple files. Fortunately, we can override the resolution of external entity references using a handler called `resolve_entity` I feel that the phrase Your file separation...multiple files adds nothing to the concept expressed; worse: it makes the book more boring and hard to read. And I found a lot of these (I am at page 99 now...) Again, it's a good book, but I feel that it would benefit of a revision in the language to make it more direct. That could make the book slimmer, but personally I don't care about the thickness more than I care about the content. Ciao! `--bronto` The very nature of Perl to be like natural language--inconsistant and full of dwim and special cases--makes it impossible to know it all without simply memorizing the documentation (which is not complete or totally correct anyway). --John M. Dlugosz	[reply]


Keep It Simple, Stupid
	PerlMonks