http://qs321.pair.com?node_id=844371

Kanishka.black0 has asked for the wisdom of the Perl Monks concerning the following question:

i have tried to use the LibXML modules to xpath a document but its getting complicated .......... i have seen this example over forum here .........

use XML::LibXML ; use strict; use warnings; { my $xml = <<'XML'; <?xml version="1.0" standalone="yes"?> <sdnList> <sdnEntry> <lastName>Hello world!</lastName> </sdnEntry> </sdnList> XML my $parser = XML::LibXML->new; my $doc = $parser->parse_string($xml); my $result = $doc->findvalue('//lastName'); print $result; } { my $xml = <<'XML'; <?xml version="1.0" standalone="yes"?> <sdnList xmlns="http://tempuri.org/sdnList.xsd"> <sdnEntry> <lastName>Hello world!</lastName> </sdnEntry> </sdnList> XML my $parser = XML::LibXML->new; my $doc = $parser->parse_string($xml); my $result = $doc->findvalue('//lastName'); print $result ; }
/ One of the member posted this solution ... /
use XML::LibXML; use XML::LibXML::XPathContext; { my $xml = <<'XML'; <?xml version="1.0" standalone="yes"?> <sdnList xmlns="http://tempuri.org/sdnList.xsd"> <sdnEntry> <lastName>Hello world!</lastName> </sdnEntry> </sdnList> XML my $parser = XML::LibXML->new; my $doc = $parser->parse_string($xml); my $xc = XML::LibXML::XPathContext->new($doc); $xc->registerNs('sdnList', 'http://tempuri.org/sdnList.xsd'); my $result = $xc->findvalue('//sdnList:lastName'); is( $result, "Hello world!", "Namespace" ); }

But every time we can't get same namespace ... if i add any API to the xml input ..... its extremely difficult to handle it ......

Is there any way to get nodes and attr and like that stuff in XML::LibXML easily ??? or any other modules excpet xml::xpath..

UPDATE ...... i think i have not conveyed the message properly ... about my last part .. .

if the XML page varies every time ... like this ....

lets take one example ... Flickr API http://www.flickr.com/services/api/explore/?method=flickr.photos.getSizes .... here we go .... if give the correct Photo ID i will get this result ...

<rsp stat="ok"> <sizes canblog="1" canprint="0" candownload="0"> <size label="Square" width="75" height="75" source="http://farm5.stati +c.flickr.com/4007/4673769513_02826f0775_s.jpg" url="http://www.flickr +.com/photos/dhushor-jen/4673769513/sizes/sq/" media="photo"/> <size label="Thumbnail" width="100" height="67" source="http://farm5.s +tatic.flickr.com/4007/4673769513_02826f0775_t.jpg" url="http://www.fl +ickr.com/photos/dhushor-jen/4673769513/sizes/t/" media="photo"/> <size label="Small" width="240" height="160" source="http://farm5.stat +ic.flickr.com/4007/4673769513_02826f0775_m.jpg" url="http://www.flick +r.com/photos/dhushor-jen/4673769513/sizes/s/" media="photo"/> <size label="Medium" width="500" height="333" source="http://farm5.sta +tic.flickr.com/4007/4673769513_02826f0775.jpg" url="http://www.flickr +.com/photos/dhushor-jen/4673769513/sizes/m/" media="photo"/> <size label="Large" width="1024" height="683" source="http://farm5.sta +tic.flickr.com/4007/4673769513_02826f0775_b.jpg" url="http://www.flic +kr.com/photos/dhushor-jen/4673769513/sizes/l/" media="photo"/> </sizes> </rsp>

what if i give incorrect photo id in the above code ....

<rsp stat="fail"> <err code="1" msg="Photo not found"/> </rsp>

Can any one tell me how can write xpath this kind of things ?

Replies are listed 'Best First'.
Re: how to get XML::LibXML perfect xpath query ?
by ikegami (Patriarch) on Jun 12, 2010 at 23:30 UTC

    But every time we can't get same namespace

    It doesn't make sense for the namespace to change. The prefix can change, but it doesn't matter what prefix the XML uses. The one you use doesn't have to be the same. Just the namespace has to be the same.

    But let's say that's what you have. If you need to match nodes by name no matter which namespace the element belongs to, you can use

    .../name()="name"/...

    instead of

    .../ns:name/...
    A reply falls below the community's threshold of quality. You may see it by logging in.
Re: how to get XML::LibXML perfect xpath query ?
by bluescreen (Friar) on Jun 12, 2010 at 23:28 UTC

    I don't understand why you don't get same namespace every time

    Anyway, If the XML are small and you need access to just a few nodes you could you XML::Simple, for example:

    use XML::Simple; use strict; use warnings; my $xml = <<XML; <?xml version="1.0" standalone="yes"?> <sdnList> <sdnEntry> <lastName>Hello world !</lastName> </sdnEntry> </sdnList > XML my $xml1 = XMLin($xml);2 my $result = $xml1->{sdnEntry}->{lastName}; print "result:$result\n"; ; $xml = <<XML; <?xml version="1.0" standalone="yes"?> <sdnList xmlns="http://tempuri.org/sdnList.xsd" > <sdnEntry> <lastName>Hello world !</lastName> </sdnEntry> </sdnList > XML my $xml2 = XMLin($xml); $result = $xml1->{sdnEntry}->{lastName}; print "result:$result\n";

    I've had performance problems using xpath queries, even it was faster to instantiate the document and consumed less memory than XML::Simple, overall it was slower accessing nodes ( which makes totally sense as XML::Simple returns a hash ).

    My point is if you need a simple parser ( i.e: for config files or for consuming small services once a while ) I'd stick to XML::Simple. In environments of high volume or frequent parsing of huge XML documents I would use Expat instead it is really fast.

      In environments of high volume or frequent parsing of huge XML documents I would use Expat instead it is really fast.

      In my experience, XML::LibXML is 12x faster than XML::Parser (Expat) at creating the same Perl-land data structure.

        My DOM parser tests compared XML::Simple vs. XML::LibXML in two platforms Solaris 10 and Linux 2.6, all my tests in Solaris ( different documents with different sizes ) showed XML::LibXML was slower than XML::Simple ( It was much slower accessing nodes), as for Linux it was the opposite and XML::LibXML was 8-10x faster than XML::Simple.

        On the other hand comparing a SAX parser using XML::Parser vs. a DOM parser using XML::LibXML or XML::Simple, expat won in both Solaris and Linux.

        Honestly, I haven't tried XML::LibXML::SAX and maybe it is way faster than XML::Parser.

        This link its a bit outdated but It has some benchmarks. Probably things changed notably since then.

      Definately LibXML can handle complex data structures ... and perfomance is done well ... more over LibXML is well maintained .. unlike other's
Re: how to get XML::LibXML perfect xpath query ?
by ikegami (Patriarch) on Jun 13, 2010 at 04:19 UTC

    Can any one tell me how can write xpath this kind of things ?

    What node are you trying to match?

      likely i want the source to an array ... in second example .. /rsp/sizes/size (source&&url) in two separate arrays
        my @images; for my $node ($doc->findnodes('/rsp/sizes/size')) { my $source = $node->getAttribute('source'); my $url = $node->getAttribute('url'); push @images, [ $source, $url ]; }