I have never had the need/requirement/want to deal with any XML before. At least not in any major way. However, I do now, and have a few questions.
Firstly, let me give the basic scenario:
The XML in question could be anything from semi well-formed/created to well-formed/created. Secondly, let's assume that the XML elements from root to a max depth of 4 are known, and we are erring on the side of caution in that the resulting XML::Simple structure may be a large mix of Hash's and Array's at different depths.
Thirdly, some of the element values will vary in size (but the total size of the XML tree itself will usually never exceed 2MB), and there will be multiple sub-element containers of the same name: I am using XMLin() without any major modifiers that will change the resulting structure.
The script I have written below works well enough with the test XML in the XML_RAW heredoc. But, before I start going too far, what suggestions does anyone have? Show me some other methods for getting 'concise' data from an XML tree :-)
use strict;
use warnings;
use XML::Simple;
my $xml_raw = <<XML_RAW;
<survey>
<animals srcurl="blah.whatever.blah" method="ftp">
<fish name="barramundi" freshwater="yes" saltwater="yes">
<river>Todd</river>
<river>Katherine</river>
</fish>
<fish name="carp" freshwater="yes" saltwater="no">
<river>Tilbuster Ponds</river>
<river>Maribyrnong</river>
<river>Patterson</river>
<river>Paterson</river>
<river>Glenelg</river>
<river>Murray</river>
<river>Bunyip</river>
<river>Campaspe</river>
</fish>
<fish name="yellowfin" freshwater="yes" saltwater="no">
<river>Eucumbene</river>
<river>Mulla Mulla Creek</river>
<river>Burrungubugge</river>
<river>Goobarragandra</river>
<river>Bombala</river>
<river>Murray</river>
<river>Emu Swamp Creek</river>
</fish>
</animals>
</survey>
XML_RAW
my $xml_hash_ref = XMLin($xml_raw, KeepRoot=>1);
my %xml_hash = %{$xml_hash_ref};
my ($tl_hk, $tl_hv) = each %xml_hash;
my $last_key = '';
my @key_stash = ();
my $ref_type = '';
my $fish_species = '';
my $fish_survey_dump ="";
# Just to show you how XML::Simple has structured the XML into a hash
#use Data::Dumper;
#print Dumper(\%xml_hash);
traverse_hash($xml_hash{$tl_hk}, $tl_hk);
# Print out the fish survey information that we wanted.
# I concatenated it into a scalar just for quick display purposes
print "\n\n$fish_survey_dump\n";
sub traverse_hash {
my ($hash_val, $last_key) = @_;
push(@key_stash, "$last_key ->");
for my $key (keys %{$hash_val}) {
$ref_type = ref($hash_val->{$key}) || "VALUE";
print "$ref_type: @key_stash $key -> ", $hash_val->{$key},"
+\n";
if($ref_type eq 'HASH') {
if($key=~/barramundi|carp|yellowfin/) {
$fish_species = $key;
concat("\n\n[ Survey information for: $fish_species ]:
+\n\n");
concat("Saltwater:" . $hash_val->{$fish_species}{'salt
+water'} . "\n");
concat("Freshwater:" . $hash_val->{$fish_species}{'fre
+shwater'} . "\n");
concat("Rivers covered in survey:\n\n");
for my $river (@{$hash_val->{$fish_species}->{'river'}
+}) {
concat("$river\n");
}
}
$last_key = $key;
# Loop through any sub hash's by calling traverse_hash() a
+gian.
traverse_hash($hash_val->{$key}, $last_key);
pop(@key_stash);
}elsif($ref_type eq 'ARRAY') {
# Array reference
traverse_array($key, @{$hash_val->{$key}});
}else{
# Hash value;
# ...
}
}
}
sub traverse_array {
my ($key, @array) = @_;
for my $array_val (@array) {
print "ARRAY-VAL: @key_stash $key -> ", $array_val,"\n";
if(ref($array_val) eq 'HASH') {
traverse_hash($array_val, undef);
}
}
}
sub concat {
my $string = $_[0];
$fish_survey_dump .= $string;
}
The script above gives the following:
HASH: survey -> animals -> HASH(0x1ad678c)
VALUE: survey -> animals -> srcurl -> blah.whatever.blah
VALUE: survey -> animals -> method -> ftp
HASH: survey -> animals -> fish -> HASH(0x1b4262c)
HASH: survey -> animals -> fish -> carp -> HASH(0x1b425f0)
ARRAY: survey -> animals -> fish -> carp -> river -> ARRAY(0x1b4272
+8)
ARRAY-VAL: survey -> animals -> fish -> carp -> river -> Tilbuster
+Ponds
ARRAY-VAL: survey -> animals -> fish -> carp -> river -> Maribyrnon
+g
ARRAY-VAL: survey -> animals -> fish -> carp -> river -> Patterson
ARRAY-VAL: survey -> animals -> fish -> carp -> river -> Paterson
ARRAY-VAL: survey -> animals -> fish -> carp -> river -> Glenelg
ARRAY-VAL: survey -> animals -> fish -> carp -> river -> Murray
ARRAY-VAL: survey -> animals -> fish -> carp -> river -> Bunyip
ARRAY-VAL: survey -> animals -> fish -> carp -> river -> Campaspe
VALUE: survey -> animals -> fish -> carp -> saltwater -> no
VALUE: survey -> animals -> fish -> carp -> freshwater -> yes
HASH: survey -> animals -> fish -> barramundi -> HASH(0x1b425e4)
ARRAY: survey -> animals -> fish -> barramundi -> river -> ARRAY(0x
+1b4277c)
ARRAY-VAL: survey -> animals -> fish -> barramundi -> river -> Todd
ARRAY-VAL: survey -> animals -> fish -> barramundi -> river -> Kath
+erine
VALUE: survey -> animals -> fish -> barramundi -> saltwater -> yes
VALUE: survey -> animals -> fish -> barramundi -> freshwater -> yes
HASH: survey -> animals -> fish -> yellowfin -> HASH(0x1b425fc)
ARRAY: survey -> animals -> fish -> yellowfin -> river -> ARRAY(0x1
+b4268c)
ARRAY-VAL: survey -> animals -> fish -> yellowfin -> river -> Eucum
+bene
ARRAY-VAL: survey -> animals -> fish -> yellowfin -> river -> Mulla
+ Mulla Creek
ARRAY-VAL: survey -> animals -> fish -> yellowfin -> river -> Burru
+ngubugge
ARRAY-VAL: survey -> animals -> fish -> yellowfin -> river -> Gooba
+rragandra
ARRAY-VAL: survey -> animals -> fish -> yellowfin -> river -> Bomba
+la
ARRAY-VAL: survey -> animals -> fish -> yellowfin -> river -> Murra
+y
ARRAY-VAL: survey -> animals -> fish -> yellowfin -> river -> Emu S
+wamp Creek
VALUE: survey -> animals -> fish -> yellowfin -> saltwater -> no
VALUE: survey -> animals -> fish -> yellowfin -> freshwater -> yes
[ Survey information for: carp ]:
Saltwater:no
Freshwater:yes
Rivers covered in survey:
Tilbuster Ponds
Maribyrnong
Patterson
Paterson
Glenelg
Murray
Bunyip
Campaspe
[ Survey information for: barramundi ]:
Saltwater:yes
Freshwater:yes
Rivers covered in survey:
Todd
Katherine
[ Survey information for: yellowfin ]:
Saltwater:no
Freshwater:yes
Rivers covered in survey:
Eucumbene
Mulla Mulla Creek
Burrungubugge
Goobarragandra
Bombala
Murray
Emu Swamp Creek