Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

Re: I want to find a group of pattern in a xml file

by grinder (Bishop)
on Sep 16, 2008 at 14:32 UTC ( #711706=note: print w/replies, xml ) Need Help??


in reply to I want to find a group of pattern in a xml file

First of all, I'll let you into a dirty little secret. 99.99% of the time, you can quite easily parse XML files with regular expression. This is because 99.99% of the time you deal with only one external party sending you XML files, and they don't code it by hand, they wrote a program to generate it.

And the thing is, they don't modify the program once it's in production, or rarely or deeply enough for it matter to you. This means that once you have figured out what the file looks like by empirical observation, you can write a few short patterns to pull out what you need.

You really need to parse XML files when you have written the spec, and many people are sending you their data based on your spec. But I digress.

When you say you want the contents of NAME and AGE elements, you probably have more context lying around in the file. Such as a PERSON element that encompasses them, otherwise you might get confused by <tree><age>437</age><name>Sequoia</name></tree> elements. To disambiguate this, you want the NAME element within the PERSON element, along with the AGE element of the PERSON element.

Furthermore, you don't know if you'll see the NAME element first, or the AGE element first. That is, you might have <person><age>56</age><name>Alice</name></person> or <name>Bill</name><age>28</age>. So what you do is you keep track of each one you find, in a hash, and after you find another element, you check to see if you have both of them, and if so you do something with them.

The following code uses XML::Twig to implement the above algorithm. I haven't tested to see whether it compiles, but suc minor details will be cleaned up by the Chatterbox crew if you care to ask them :)

use strict; use warnings; use XML::Twig; my $twig = do { my %seen; XML::Twig->new( twig_handlers => { 'PERSON/NAME' => sub { my ($t, $e) = @_; $seen{NAME} = $e->text; check(\%seen); }, 'PERSON/AGE' => sub { my ($t, $e) = @_; $seen{AGE} = $e->text; check(\%seen); } } ) }; sub check { my $person = shift; return unless keys %$person == 2; print "$person->{NAME} is $person->{AGE} years old.\n"; %$person = (); } for my $file (@ARGV) { $twig->parsefile($file); }

• another intruder with the mooring in the heart of the Perl

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://711706]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others scrutinizing the Monastery: (3)
As of 2022-05-16 19:48 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Do you prefer to work remotely?



    Results (63 votes). Check out past polls.

    Notices?