http://qs321.pair.com?node_id=293387

IOrdy has asked for the wisdom of the Perl Monks concerning the following question:

For a project I'm working on I need a higher level API that can access/change sub trees in an XML document. I was thinking of providing the plugin API as a SAX filter so that it could be part of a larger sax machine (i.e. a pipeline that uses XML::STX or whatever to process the stream later).

The Taglib Plugin API - In brief I was thinking of registering subs with the taglib for each tag and passing in a subtree to each tag sub.
The returned structure from the tag sub is placed back into the stream as SAX events so they in turn can be parsed by the taglib.

For example:
package My::Taglib; use strict; use warnings; use base qw(XML::Filter::Foo); # name of filter yet to be decided. sub new { my $class = shift; my $self = bless ({}, $class); $self->register_ns(qw(http://www.iordy.com/ns/node)); $self->register_tags(qw(node hello)); } sub tag_node { # gets access to all children may do some processing based on them + # and then add in <foo:hello /> below <title>foo</title> } sub tag_hello { # replaces <foo:hello /> with <hello>world</hello> } 1; __DATA__ <data> <foo:node xmlns:foo="http://www.iordy.com/ns/foo"> <title>foo</title> </foo:node> </data>

I think I can cope with the buffering for sub documents etc. but I cant find a cpan module that implements a simple enough structure for the tag sub to modify. To clarify if I build a sub tree with XML::Simple I loose what was an attribute and what was a child (dont I?) when I create the SAX events from the result the tag sub returns and this will affect downstream handlers. Also it would be much faster if instead of buffering the events I build a sub tree as I go and deliver parts (or smaller sub trees as I hit the end of tags) until all open tags have hit end tags and we can remove the buffer.

I also thought about converting the subtree back to a string so that the taglib could use a parser of choice or simply just modify the text but this seems like a bad idea because you'll end up installing half the cpan just to meet dependencies if taglibs are implemented by different developers.

What I'd like to use/create is a simple perl structure like XML::Simple for the sub tree but one that maintaines the origional struture (namespaces, cdata, comments, children/attributes) that could be modified or just returned by the tag sub.

XML::Twig seems to have the sort of interface I want but it can't be used as a SAX filter only as a generator (at least in the version I saw). I looked at using XML::Twig for the whole project instead of SAX but I hope to use SAX so that I could use my taglibs in pipelines with other SAX tools like XML::STX.

This is where I am now and would like to ask for the wisdom of the monks as I'm not the worlds best programmer and I dont know anybody else to run this past. Is there a project I have missed that implements something like this? is it realistic or am I just asking to much?