This meditation is just an example to show why you shouldn't reinvent the wheel.
I wrote a script that gets its input as XML. First, I wrote a prototype in XML::Parser. Later, I reimplemented the code using XML::Twig. The output structure is not the same in the two code, as the rest of the code has changed a lot. The second code is slightly shorter and much simpler, even though also collects a bit more data too: the "value" elements.
Update.
This meditation doesn't make sense this way, so let me clarify. Also, I've changed the readmore tags a bit, and added some comments as noted below. (Update: also changing the original misleading title)
XML::Parser is a low-level XML module, it's not very usable for actually working with XML, but it's good for writing other XML modules on it. All XML::Parser does is give you the tokens of an XML, and verify that it is well-formed. When I wrote the program with XML::Parser, I had to implement some higher-level XML functionality in complicated ways. This reimplementation can be seen in the first (un-readmored) part of the first code, which I have marked with comments now.
XML::Twig is a higher-level XML parser, which allows us to manipulate parts of an XML tree as a whole, not just tokens. (In fact, XML::Twig calls XML::Parser underneath, but this is not important here.) This code is short, so the second variant of my code is not that much shorter than the first one, but it is clearly simpler. It would be a hassle to use XML::Parser for a more complicated application, (Update:) or rather, for a more complicated XML file.
End of update
Here's the first code. Ignore the parts concerning %notroot, that part was moved down in the real code, as it's not really part of inputting the data.
use warnings;
use strict;
+
use XML::Parser;
my($filename, # ... and some more options ...
);
# ... get options ...
my(%child, %member, %goodness, %notroot);
+
{
# REIMPLEMENTING THE WHEEL STARTS HERE
my(%handler, %starthandler, @endhandler, @string, $string);
$handler{"Default"} = sub { };
$handler{"Start"} = sub {
push @string, $string; $string = "";
my $h = $starthandler{$_[1]};
push @endhandler, ($h ? &$h(@_[2 .. @_ - 1]) : sub { }
+);
};
$handler{"End"} = sub {
&{pop @endhandler}($_[1]);
$string = pop @string;
};
$handler{"Char"} = sub { $string .= $_[1]; };
+
# REIMPLEMENTING THE WHEEL ENDS HERE (more or less)
my($child, $member, $goodness, $id);
$starthandler{"cluster"} = sub {
$child = []; $member = []; $goodness = ();
$id = "";
sub {
"" ne $id or die "cluster without id";
$child{$id} and die "dupe cluster id: $id";
$child{$id} = $child;
$member{$id} = $member;
$goodness{$id} = $goodness;
};
};
$starthandler{"id"} = sub {
sub { $id = $string };
};
$starthandler{"goodness"} = sub {
sub { $goodness = $string };
};
$starthandler{"child"} = sub {
sub {
$notroot{$string}++;
push @$child, $string;
};
};
$starthandler{"member"} = sub {
sub { push @$member, $string; };
};
XML::Parser->new("Handlers", \%handler)->parsefile($ARGV[0]);
warn "done parsing xml";
+
}
+
# ... and here we actually do something with what we'we read, but I wo
+n't show that ...
And here's the second code:
use warnings;
use strict;
+
use XML::Twig;
my($filename, @attrib, $verbose, # ... and some more options ...
);
# ... get options ...
my(%cluster, %element, %attrib);
{
my $EMPTY = [];
my %handler;
for my $a (@attrib) { $attrib{$a} = 1; }
$handler{"element"} = sub {
my($t, $e) = @_;
my $id = $e->first_child_trimmed_text("id");
length($id) or
die "invalid input: element with no id";
exists($element{$id}) and
die qq[invalid input: duplicate element id "$i
+d"];
my(%new, $a, $ae);
for $ae ($e->children("value")) {
$attrib{$a = $ae->att("id")} and
$new{$a} = $ae->text;
}
$element{$id} = \%new;
$t->purge;
1;
};
$handler{"cluster"} = sub {
my($t, $e) = @_;
my $id = $e->first_child_trimmed_text("id");
my %c;
length($id) or
die "invalid input: cluster without id";
exists($cluster{$id}) and
die qq[invalid input: duplicate cluster id "$i
+d"];
my $g = $e->first_child_trimmed_text("goodness");
length($g) and $c{"goodness"} = 0 + $g;
my @m = map { $_->trimmed_text } $e->children("member"
+);
$c{"members"} = @m ? \@m : $EMPTY;
my @c = map { $_->trimmed_text } $e->children("child")
+;
$c{"children"} = @c ? \@c : $EMPTY;
$cluster{$id} = \%c;
$t->purge;
1;
};
my $twig = XML::Twig->new("twig_handlers", \%handler);
$verbose and warn "starting to parse xml file";
$twig->parsefile($filename);
$verbose and warn "finished parsing xml file";
my $root = $twig->root;
+
}
+
# ... and I omit the rest of the code again ...
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.