comment on

First of all, I'll let you into a dirty little secret. 99.99% of the time, you can quite easily parse XML files with regular expression. This is because 99.99% of the time you deal with only one external party sending you XML files, and they don't code it by hand, they wrote a program to generate it.

And the thing is, they don't modify the program once it's in production, or rarely or deeply enough for it matter to you. This means that once you have figured out what the file looks like by empirical observation, you can write a few short patterns to pull out what you need.

You really need to parse XML files when you have written the spec, and many people are sending you their data based on your spec. But I digress.

When you say you want the contents of NAME and AGE elements, you probably have more context lying around in the file. Such as a PERSON element that encompasses them, otherwise you might get confused by <tree><age>437</age><name>Sequoia</name></tree> elements. To disambiguate this, you want the NAME element within the PERSON element, along with the AGE element of the PERSON element.

Furthermore, you don't know if you'll see the NAME element first, or the AGE element first. That is, you might have <person><age>56</age><name>Alice</name></person> or <name>Bill</name><age>28</age>. So what you do is you keep track of each one you find, in a hash, and after you find another element, you check to see if you have both of them, and if so you do something with them.

The following code uses XML::Twig to implement the above algorithm. I haven't tested to see whether it compiles, but suc minor details will be cleaned up by the Chatterbox crew if you care to ask them :)

use strict;
use warnings;

use XML::Twig;

my $twig = do {
    my %seen;
    XML::Twig->new(
        twig_handlers => {
            'PERSON/NAME' => sub {
                my ($t, $e) = @_;
                $seen{NAME} = $e->text;
                check(\%seen);
            },
            'PERSON/AGE' => sub {
                my ($t, $e) = @_;
                $seen{AGE} = $e->text;
                check(\%seen);
            }
        }
    )
};

sub check {
    my $person = shift;
    return unless keys %$person == 2;
    print "$person->{NAME} is $person->{AGE} years old.\n";
    %$person = ();
}

for my $file (@ARGV) {
    $twig->parsefile($file);
}
[download]

• another intruder with the mooring in the heart of the Perl

In reply to Re: I want to find a group of pattern in a xml file by grinder
in thread I want to find a group of pattern in a xml file by cybär

Are you posting in the right place? Check out Where do I post X? to know for sure.
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
Want more info? How to link or How to display code and escape characters are good places to start.


Just another Perl shrine
	PerlMonks