Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Re: XML::Parser Tutorial

by Jenda (Abbot)
on Aug 21, 2008 at 19:14 UTC ( [id://705946]=note: print w/replies, xml ) Need Help??


in reply to XML::Parser Tutorial

The first rule of XML::Parser's use: Don't. Or rather, don't use it directly. Unless you really must which is much less often than you might think.

#!/usr/bin/perl -w use strict; use XML::Rules; use LWP::Simple; # used to fetch the chatterbox ticker my $cb_ticker = get("http://perlmonks.org/index.pl?node=chatterbox+xml ++ticker"); my $parser = XML::Rules->new( stripspaces => 7, rules => { message => sub { my ($tag, $atts) = @_; $atts->{'_content'} =~ s/\n//g; my ($y,$m,$d,$h,$n,$s) = $atts->{'time'} =~ m/^(\d{4})(\d{ +2})(\d{2})(\d{2})(\d{2})(\d{2})$/; # Handles the /me $atts->{'_content'} = $atts->{'_content'} =~ s/^\/me// ? "$atts->{'author'} $atts->{'_content'}" : "<$atts->{'author'}>: $atts->{'_content'}"; $atts->{'_content'} = "$h:$n " . $atts->{'_content'}; print "$atts->{'_content'}\n"; return; }, 'INFO,CHATTER' => 'pass', } ); $parser->parse($cb_ticker);

Isn't this easier? Now imagine the <message> tag was not so simple, imagine it contained a structure of subtags and subsubtags. Your handlers would have to keep track of where in the structure the parser is and would have to build the datastructure containin that data so that finaly they can access it in the endtag handler if and only iff the tag is <message>. Not what I would call convenient.

With XML::Rules you'd just specify what tags do you want to include (and whether they are supposed to be repeated, contain text content etc. ... the rules may be infered from a DTD or example) and assign a handler specificaly to the <message> tag. And the handler will have access to the datastructure built from the subtags.

With XML::Twig you'll specify the twig_root (or something, I don't remember details) and again will assign a handler to the specific tag and receive all the data from the part of the XML enclosed in it.

And in neither case does the parser have to parse the whole file before your handlers are started and at no time is the whole parsed XML in the memory. (Well, if you use the modules correctly.)

Replies are listed 'Best First'.
Re^2: XML::Parser Tutorial
by Mike Blume (Initiate) on Aug 22, 2008 at 18:49 UTC
    Jenda, was that in response to my message?
    I can't tell if it was or not because it uses the example from the main post, not the one I had.


    Mike

      No it was a response to the root node. For your problem ... show us your code. It's true that the Char handler will never be called, but both the Start and End handlers should. In either case you of course can use XML::Rules for that XML as well, specify a handler for the student tag and it will obtain all the attributes. Or, it you do not need to handle the individual <student> tags as you read them, specify student => 'as array', or possibly student => 'by name', in the rules. And handle the array of students in $attr->{student} or access the individual students as $attr->{$name} (depends on the rule you specify for <student>) in the handler for <class>.

        I think I'm going to have to look at the Rules documentation. I'm going to be getting several XML files, some designed w/ the format like I described above and some structured like the conventional way of how XML was originally designed (with inner text).

        I want to have a program that is robust enough to import the XML file, and create either MySQL or PostgreSQL tables add columns to existing tables (should it need this).

        I was hoping that XML::Parser was already set up to be robust enough to get the information, no matter the format, with little programmer intervention (not needing to setup rules). Instead being able to return all the attributes and values for any element, self-closing, or not. :)

        Thanks for the help though,
        Mike

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://705946]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others contemplating the Monastery: (4)
As of 2024-04-19 16:53 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found