Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

Re: (RFC) XML::Rules - yet another XML parser

by trwww (Priest)
on Nov 06, 2006 at 00:08 UTC ( [id://582348]=note: print w/replies, xml ) Need Help??


in reply to (RFC) XML::Rules - yet another XML parser

Hi Jenda

I've done stuff like this very often.

When I see this, I see a generic state maintainence mechanism for SAX.

SAX is great. It is sometimes the only option when you have documents that are too big to fit in to RAM. But because it provides nothing more than a dispatch mechanism for document particles, maintaining state between the different callbacks can get tricky, boring, and error prone.

As an example, refer to Kip Hampton's excellent xml.com article High-Performance XML Parsing With SAX.

In it, he uses an XML document that represents a series of emails that need sent as the sample data. He uses SAX to build up an argument list for Mail::Sendmail. When the end_element callback is fired for the record, he has Mail::Sendmail send the email.

The relevant part here is note how much code he has to write to extract the data from the individual record.

According to the POD in your module:

Or you could view it as yet another event based XML parser that differs from all the others only in two things. First that it only let's you hook your callbacks to the closing tags. And that stores the data for you so that you do not have to use globals or closures and wonder where to attach the snippet of data you just received onto the structure you are building

what you have would help quite a bit in maintaining the state of an XML record.

If you dont mind, what would the rules for your module look like to perform the same task Hampton did with SAX?

Your module would be very useful implemented as a SAX handler so users could take advantage of the many features of SAX (swappable parsers, chained filters/handlers, document writers, standardized interface). Imagine the process you describe aboive as a web serice. You could use a SAX writer in the same pipeline to build up the response for the client request.

Regardless of how you proceed, I'll definitely keep an eye on it. I know I'll use it.

trwww

  • Comment on Re: (RFC) XML::Rules - yet another XML parser

Replies are listed 'Best First'.
Re^2: (RFC) XML::Rules - yet another XML parser
by Jenda (Abbot) on Nov 06, 2006 at 01:07 UTC

    The code would look like this:

    my ($message_count, $sent_count); my $parser = new XML::Rules ( rules => [ _default => 'content', message => sub { $message_count++; Mail::Sendmail::sendmail( from => $_[1]->{from}, to => $_[1]->{to}, subject => $_[1]->{subject}, body => $_[1]->{body}, ) or warn "Mail Error: $Mail::Sendmail::error"; $sent_count++ unless $Mail::Sendmail::error; return; }, messages => sub { print "SAX Mailer Finished\n$sent_count of $message_count mess +age(s) sent\n"; return; } ]); $parser->parsefile($file);
    or, if you did not want to use any external variables:
    my $parser = new XML::Rules ( rules => [ _default => 'content', message => sub { $_[3][-1]->{message_count}++; Mail::Sendmail::sendmail( from => $_[1]->{from}, to => $_[1]->{to}, subject => $_[1]->{subject}, body => $_[1]->{body}, ) or warn "Mail Error: $Mail::Sendmail::error"; $_[3][-1]->{sent_count}++ unless $Mail::Sendmail::error; return; }, messages => sub { print "SAX Mailer Finished\n$_[1]->{sent_count} of $_[1]->{mes +sage_count} message(s) sent\n"; return; } ]); $parser->parsefile($file);
    or
    my $parser = new XML::Rules ( rules => [ _default => 'content', message => sub { my ($tag_name, $tag_hash, $context, $parent_data) = @_; $parent_data->[-1]->{message_count}++; Mail::Sendmail::sendmail( from => $tag_hash->{from}, to => $tag_hash->{to}, subject => $tag_hash->{subject}, body => $tag_hash->{body}, ) or warn "Mail Error: $Mail::Sendmail::error"; $parent_data->[-1]->{sent_count}++ unless $Mail::Sendmail::err +or; return; }, messages => sub { print "SAX Mailer Finished\n$_[1]->{sent_count} of $_[1]->{mes +sage_count} message(s) sent\n"; return; } ]); $parser->parsefile($file);

    I guess I should add a few more ways to add data to the parent node. Apart from 'attributename' that sets (and if needed overwrites) the attribute and '@attributename' that appends the value to the array I should also allow '+attributename' and '.attributename'. '-attributename' is not needed, but I wonder whether to add '*attributename'.

    With the '+attributename' the code would look like this:

    my $parser = new XML::Rules ( rules => [ _default => 'content', message => sub { Mail::Sendmail::sendmail( from => $_[1]->{from}, to => $_[1]->{to}, subject => $_[1]->{subject}, body => $_[1]->{body}, ) or warn "Mail Error: $Mail::Sendmail::error"; return '+message_count' => 1, '+sent_count' => ($Mail::Sendmai +l::error ? 0 : 1); }, messages => sub { print "SAX Mailer Finished\n$_[1]->{sent_count} of $_[1]->{mes +sage_count} message(s) sent\n"; return; } ]); $parser->parsefile($file);

    All code except the last snippet is tested :-)

    I'll have a look at SAX, currently the module sits on top of XML::Parser:Expat, but I think it should not be a big deal to change that. Or to change the code so that you can choose what to use.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://582348]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others sharing their wisdom with the Monastery: (2)
As of 2024-04-19 19:10 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found