Parsing generic XML

aknipp has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Parsing generic XML by wind (Priest) on Jun 24, 2011 at 16:31 UTC
If you're asking how to access the data structure returned by XML::Simple, that entirely depends on the structure of your XML. There is nothing wrong with the code that you've listed above, but if you're wanting to know why $ref->{$_} isn't listing deeper contents, maybe it's because it's a more complex data structure than just a scalar? If so, just use Data::Dumper to output it or use ref to determine what type it is and use recursion: `foreach (keys %{$ref}) { print $_." ".Dumper($ref->{$_})."\n"; }` [download] Note, there are parameters that you can pass to XMLin to adjust the way that it creates the data structure from the source XML. Just read the cpan docs for more details.	[reply] [d/l]
Re^2: Parsing generic XML by aknipp (Initiate) on Jun 24, 2011 at 16:50 UTC
Thanks. I will try your code. The XML I plan to use is variable, so I was trying to write something rather generic. If I can identify a structure vs a node I should be OK.	[reply]
Re: Parsing generic XML by Khen1950fx (Canon) on Jun 24, 2011 at 18:13 UTC
XML::Simple has a strict mode. You'll want to get in the habit of using it. `#!/usr/bin/perl use strict; use warnings; use XML::Simple qw(:strict); use Data::Dumper::Concise; my $file = shift @ARGV; my $xml = $file; open IN, '<', $xml or die $!; { local $/ = undef; $xml = <IN>; } close IN; my $ref = XMLin($xml, KeyAttr => {item => 'name'}, ForceArray => [ 'item' ], ContentKey => '-content' ); foreach (keys %{$ref}) { print Dumper("$_ = " . $ref->{$_}), "\n"; }` [download]	[reply] [d/l]
Re: Parsing generic XML by ikegami (Patriarch) on Jun 24, 2011 at 17:25 UTC
`# Reference to array of foo elements. my $foos = (ref($node->{foo}) // '') eq 'ARRAY' ? $node->{foo} : [ $node->{foo} ];` [download] or `XMLin($xml, ForceArray => 1); ... # Reference to array of foo elements. my $foos = $node->{foo};` [download]	[reply] [d/l] [select]
Re: Parsing generic XML by graff (Chancellor) on Jun 25, 2011 at 01:34 UTC
Since (according to one of your replies above) the xml input is "variable", you might be interested in the following, which I wrote a while back just to be able to summarize xml tag structures in a generic way. I prefer "low level" xml modules like XML::Parser and XML::LibXML, because for some reason I find that they are actually easier for me to learn, compared to the "refined sugar" approaches like XML::Simple and XML::Twig; I don't mind writing a few extra lines of code, given that I'm able to understand more quickly what the code is really doing. As for going beyond simple summarization and updating content, I think LibXML would be the tool I'd prefer. #!/usr/bin/perl use strict; use XML::Parser; my $Usage = "$0 [-r] [-b] file.xml\n"; my ( $add_root, $count_attribs, $discrete_count ); while ( @ARGV > 1 and $ARGV[0] =~ /^-([abr])$/ ) { if ( $1 eq 'r' ) { $add_root = shift; } elsif ( $1 eq 'a' ) { $count_attribs = shift; } else { $discrete_count = shift; } } die $Usage unless ( @ARGV == 1 and -f $ARGV[0] ); my %embedding; my $key = ''; my %ehist; my %ahist; my $p = XML::Parser->new( Handlers => { Start => sub{ my $newkey = "$key/$_[1]"; if ( $key and $discrete_coun +t and !exists( $embedding{$ke +y} )) { $embedding{$key}++; $ehist{$key}--; } $key = $newkey; $ehist{$key}++; if ( $count_attribs ) { for ( my $i=2; $i<$#_; $ +i+=2 ) { $ahist{$key}{$_[$i]} +++; } } }, End => sub{ delete $embedding{$key} if ( + $discrete_count ); $key =~ s{/$_[1]$}{} }, } ); if ( ! $add_root ) { $p->parsefile( $ARGV[0] ); } else { my $xmlstr = "<STRUCT_HIST_ROOT_$$>\n"; open( X, '<:utf8', $ARGV[0] ) or die "Unable to read $ARGV[0]: $!\ +n"; { local $/ = undef; $xmlstr .= <X>; } close X; $xmlstr .= "</STRUCT_HIST_ROOT_$$>"; $p->parse( $xmlstr ); } for my $k ( sort keys %ehist ) { $_ = $k; if ( $add_root ) { s{/STRUCT_HIST_ROOT_$$}{}; next unless /\S/; } next if ( $discrete_count and $ehist{$k} <= 0 ); print "$ehist{$k}\t$_\n"; if ( $count_attribs ) { print "\t$ahist{$k}{$_}\t\@$_\n" for ( sort keys %{$ahist{$k}} + ); } } =head1 NAME xml-structure-hist =head1 SYNOPSIS xml-structure-hist [-r] [-a] [-b] file.xml -r : have the program supply a root node tag -a : tabulate element attributes (only on raw element counts) -b : count only "bottom-level" paths (def: also count intermed.paths + ) =head1 DESCRIPTION For any given xml file, this tool will use a standard xml parser to tabulate the structure of the tags and print (on STDOUT) a tally of how many times each distinct structural element occurs in the file. Use the "-r" option if the input file does not include its own "root" xml tag (e.g. when multiple blocks of similar xml data are concatenate +d without a wrapper tag being put around them). For example, given an xml file with these contents: <root_node> <level1 id="x"> <level2_a><level3 x="y">...</level3><level3>...</level3></level2_a> <level2_a><level3 x="z">...</level3><level3>...</level3></level2_a> </level1> <level1 id="y"> <level2_a><level3 x="w"><level4>...</level4>...</level3></level2_a> <level2_b><level3 x="x">...</level3></level2_b> </level1> <level1 id="z"> <level2_a>...</level2_a> </level1> </root_node> the default output would be: 1 /root_node 3 /root_node/level1 4 /root_node/level1/level2_a 5 /root_node/level1/level2_a/level3 1 /root_node/level1/level2_a/level3/level4 1 /root_node/level1/level2_b 1 /root_node/level1/level2_b/level3 With tha "-a" option, the output would be: 1 /root_node 3 /root_node/level1 3 @id 4 /root_node/level1/level2_a 5 /root_node/level1/level2_a/level3 3 @x 1 /root_node/level1/level2_a/level3/level4 1 /root_node/level1/level2_b 1 /root_node/level1/level2_b/level3 1 @x With the "-b" option, the output would be: 1 /root_node/level1/level2_a 4 /root_node/level1/level2_a/level3 1 /root_node/level1/level2_a/level3/level4 1 /root_node/level1/level2_b/level3 If the example lacked the "root_node" tags, you would use the "-r" option, and the quantities reported for the "level*" tags would be the same as above. =head1 AUTHOR David Graff <graff at ldc.upenn.edu> =cut [download]	[reply] [d/l]
Re: Parsing generic XML by grantm (Parson) on Jun 25, 2011 at 00:26 UTC
This post will give you more info on ways to use XML::Simple and how to achieve the same things with XML::LibXML.	[reply]
Re: Parsing generic XML by sundialsvc4 (Abbot) on Jun 25, 2011 at 00:50 UTC
Casting a recommend here for XML::Twig. I have consistently found that, when XML is concerned (and especially if they are really big files ... or might become so ...) the “big guns” are the best.


laziness, impatience, and hubris
	PerlMonks