Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

problem in XML parsing

by Anonymous Monk
on Mar 22, 2008 at 01:36 UTC ( [id://675570]=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hello monks, as I am new to Perl, I am trying to parse an xml file just by manipulating it as a simple file (I am totally unfamiliar with modules etc)... Up to now I seemed to have no problem, but I have come to a point that I need help. What I need help in is parsing xml nodes that appear more than one times in my xml file.
Consider the following example file:
<football_team> <Name>ONE</Name> <DateCreated> <Year>1990</Year> <Month>12</Month> <Day>04</Day> </DateCreated> <Players> <Player> <LastName>Wonguso</LastName> <ForeName>Jimm</ForeName> <Initials>JW</Initials> </Player> <Player> <LastName>Fearless</LastName> <ForeName>A</ForeName> <Initials>AF</Initials> </Player> </Players> <Hometown>New York</Hometown> </football_team> <football_team> <Name>TWO</Name> <DateCreated> <Year>1978</Year> <Month>9</Month> <Day>23</Day> </DateCreated> <Players> <Player> <LastName>Pedro</LastName> <ForeName>Therry</ForeName> <Initials>TP</Initials> </Player> <Player> <LastName>Haywardis</LastName> <ForeName>Richie</ForeName> <Initials>RH</Initials> </Player> <Player> <LastName>Eswa</LastName> <ForeName>Jeyan</ForeName> <Initials>JE</Initials> </Player> <Player> <LastName>Leongus</LastName> <ForeName>John</ForeName> <Initials>JL</Initials> </Player> <Player> <LastName>Bentson</LastName> <ForeName>Billie</ForeName> <Initials>BB</Initials> </Player> </Players> <Hometown>California</Hometown> </football_team>
Can you guide me on how to print, seperately, the names of the players of each team? For the other data (ie name of the team, year created etc) that appear only once in each xml entry, I have no problem printing them out.
Thank you beforehand!

Replies are listed 'Best First'.
Re: problem in XML parsing
by hipowls (Curate) on Mar 22, 2008 at 03:53 UTC

    There are lots of modules on CPAN that parse XML. You really don't want to do it yourself except as as a learning exercise. And then you will use a module from CPAN;)

    When dealing with data structures it is very helpful to print the data you are working with. I've use YAML others use Data::Dumper, either is fine.

    Update: I've changed the code below in light of ikegami's comments.

    This is one approach using XML::Simple

    #!/net/perl/5.10.0/bin/perl use strict; use warnings; use YAML; use XML::Simple; my $teams = XMLin( do { local $/; <DATA> }, ForceArray => [qw( football_team Player )], ); print Dump $teams; foreach my $team ( @{ $teams->{football_team} } ) { print "$team->{Name}\n"; foreach my $player ( @{ $team->{Players}{Player} } ) { print " $player->{ForeName} $player->{LastName}\n"; } } __DATA__
    which produces
    --- football_team: - DateCreated: Day: 04 Month: 12 Year: 1990 Hometown: New York Name: ONE Players: Player: - ForeName: Jimm Initials: JW LastName: Wonguso - ForeName: A Initials: AF LastName: Fearless - DateCreated: Day: 23 Month: 9 Year: 1978 Hometown: California Name: TWO Players: Player: - ForeName: Therry Initials: TP LastName: Pedro - ForeName: Richie Initials: RH LastName: Haywardis - ForeName: Jeyan Initials: JE LastName: Eswa - ForeName: John Initials: JL LastName: Leongus - ForeName: Billie Initials: BB LastName: Bentson ONE Jimm Wonguso A Fearless TWO Therry Pedro Richie Haywardis Jeyan Eswa John Leongus Billie Bentson

      I don't like XML::Simple. Remove a team from the sample data and you get

      Not an ARRAY reference at 675580.pl line 16.

      Remove a player from ONE and you'll get

      ONE Not an ARRAY reference at 675580.pl line 18.

      The call to XMLin for this schema should be *at least*

      my $teams = XMLin( $xml_text, ForceArray => [qw( football_team Player )], );

      And then there's ForceContent.

      XML::Simple is deceptively complex to use because it returns data in an unpredictable format, forcing you to use ref all over the place or to specify arguments to tell it what it already knows but chooses to destroy. All that because XML::Simple assumes attributes and elements are interchangeable. (Are they ever?)

        Thanks for pointing out the potential problem with my code, I've updated the original post.

        I agree that XML::Simple isn't simple but at some point you have to provide a mapping between XML and perl data structures and nothing is truly simple. For reading XML without attributes it is good enough, though you do need to be aware of the traps (but that's what the man page and perlmonks are for;).

        For "real" XML processing I'd look elsewhere but this wasn't that question and then I'd be far more likely to seek advice than offer it.

Re: problem in XML parsing
by nefigah (Monk) on Mar 22, 2008 at 04:28 UTC
    I am totally unfamiliar with modules etc

    This is an excellent and painless chance to get familiar with them :) One of the beauties of Perl is that there's a good chance someone has already done something for you, and it's easy to find that something.

    Give the modules mentioned in the previous reply a shot. If you have a specific question or concern regarding modules in general (installation or what have you), feel free to ask.


    I'm a peripheral visionary... I can see into the future, but just way off to the side.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://675570]
Approved by Joost
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others cooling their heels in the Monastery: (8)
As of 2024-04-23 17:40 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found