Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??

This meditation is just an example to show why you shouldn't reinvent the wheel.

I wrote a script that gets its input as XML. First, I wrote a prototype in XML::Parser. Later, I reimplemented the code using XML::Twig. The output structure is not the same in the two code, as the rest of the code has changed a lot. The second code is slightly shorter and much simpler, even though also collects a bit more data too: the "value" elements.

Update. This meditation doesn't make sense this way, so let me clarify. Also, I've changed the readmore tags a bit, and added some comments as noted below. (Update: also changing the original misleading title)

XML::Parser is a low-level XML module, it's not very usable for actually working with XML, but it's good for writing other XML modules on it. All XML::Parser does is give you the tokens of an XML, and verify that it is well-formed. When I wrote the program with XML::Parser, I had to implement some higher-level XML functionality in complicated ways. This reimplementation can be seen in the first (un-readmored) part of the first code, which I have marked with comments now.

XML::Twig is a higher-level XML parser, which allows us to manipulate parts of an XML tree as a whole, not just tokens. (In fact, XML::Twig calls XML::Parser underneath, but this is not important here.) This code is short, so the second variant of my code is not that much shorter than the first one, but it is clearly simpler. It would be a hassle to use XML::Parser for a more complicated application, (Update:) or rather, for a more complicated XML file.

End of update

Here's the first code. Ignore the parts concerning %notroot, that part was moved down in the real code, as it's not really part of inputting the data.

use warnings; use strict; + use XML::Parser; my($filename, # ... and some more options ... ); # ... get options ... my(%child, %member, %goodness, %notroot); + {
# REIMPLEMENTING THE WHEEL STARTS HERE my(%handler, %starthandler, @endhandler, @string, $string); $handler{"Default"} = sub { }; $handler{"Start"} = sub { push @string, $string; $string = ""; my $h = $starthandler{$_[1]}; push @endhandler, ($h ? &$h(@_[2 .. @_ - 1]) : sub { } +); }; $handler{"End"} = sub { &{pop @endhandler}($_[1]); $string = pop @string; }; $handler{"Char"} = sub { $string .= $_[1]; }; + # REIMPLEMENTING THE WHEEL ENDS HERE (more or less)
my($child, $member, $goodness, $id); $starthandler{"cluster"} = sub { $child = []; $member = []; $goodness = (); $id = ""; sub { "" ne $id or die "cluster without id"; $child{$id} and die "dupe cluster id: $id"; $child{$id} = $child; $member{$id} = $member; $goodness{$id} = $goodness; }; }; $starthandler{"id"} = sub { sub { $id = $string }; }; $starthandler{"goodness"} = sub { sub { $goodness = $string }; }; $starthandler{"child"} = sub { sub { $notroot{$string}++; push @$child, $string; }; }; $starthandler{"member"} = sub { sub { push @$member, $string; }; };
XML::Parser->new("Handlers", \%handler)->parsefile($ARGV[0]);
warn "done parsing xml"; + } + # ... and here we actually do something with what we'we read, but I wo +n't show that ...

And here's the second code:

use warnings; use strict; + use XML::Twig; my($filename, @attrib, $verbose, # ... and some more options ... ); # ... get options ... my(%cluster, %element, %attrib); { my $EMPTY = []; my %handler; for my $a (@attrib) { $attrib{$a} = 1; } $handler{"element"} = sub { my($t, $e) = @_; my $id = $e->first_child_trimmed_text("id"); length($id) or die "invalid input: element with no id"; exists($element{$id}) and die qq[invalid input: duplicate element id "$i +d"]; my(%new, $a, $ae); for $ae ($e->children("value")) { $attrib{$a = $ae->att("id")} and $new{$a} = $ae->text; } $element{$id} = \%new; $t->purge; 1; }; $handler{"cluster"} = sub { my($t, $e) = @_; my $id = $e->first_child_trimmed_text("id"); my %c; length($id) or die "invalid input: cluster without id"; exists($cluster{$id}) and die qq[invalid input: duplicate cluster id "$i +d"]; my $g = $e->first_child_trimmed_text("goodness"); length($g) and $c{"goodness"} = 0 + $g; my @m = map { $_->trimmed_text } $e->children("member" +); $c{"members"} = @m ? \@m : $EMPTY; my @c = map { $_->trimmed_text } $e->children("child") +; $c{"children"} = @c ? \@c : $EMPTY; $cluster{$id} = \%c; $t->purge; 1; }; my $twig = XML::Twig->new("twig_handlers", \%handler); $verbose and warn "starting to parse xml file"; $twig->parsefile($filename); $verbose and warn "finished parsing xml file"; my $root = $twig->root; + } + # ... and I omit the rest of the code again ...

In reply to Do not reinvent the wheel: real-world example using XML::Twig by ambrus

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others lurking in the Monastery: (1)
As of 2024-04-24 16:55 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found