Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

xpath problem using XML::LibXML

by anadem (Scribe)
on Apr 12, 2014 at 05:25 UTC ( [id://1082058]=perlquestion: print w/replies, xml ) Need Help??

anadem has asked for the wisdom of the Perl Monks concerning the following question:

I need to change some xml data, and I'm stuck trying to identify the node to modify, so hoping someone can dispel my ignorance.

This is a sample of the xml data file ('n4000-small.xml'); the real data has many matrix elements, not just two, and has siblings of the vm elements:
<formation name="stoneridge" version="1.6"> <block> <matrix> <vm type="sns"> <release version="8.5.e"> <supported> <fixed>123.1106</fixed> </supported> </release> </vm> <vm type="br"> <release version="7.2.2"> </release> </vm> </matrix> <matrix> <vm type="sns"> <release version="4.1.e"> <supported> <min>124.1306</min> <max>124.1500</max> </supported> </release> </vm> <vm type="br"> <release version="7.2.1"> </release> </vm> </matrix> </block> </formation>
I need to locate the 'matrix' node which has a 'vm' node of 'type="br"' with a specific 'release version' attribute value, then change that matrix's 'vm type="sns"' values (more on that below). I can't figure out what's wrong with my code:
use strict; use warnings; use XML::LibXML; my $file = 'n4000-small.xml'; my $parser = XML::LibXML->new(); my $doc = $parser->parse_file($file); # THIS WORKS - but it's finding the vm node not the matrix node foreach my $vm ($doc->findnodes('//vm[@type=\'br\']')) { print "br ", $vm->findvalue("./release/\@version"), "\n"; if ( '7.2.1' eq $vm->findvalue("./release/\@version")) { print "found 7.2.2\n" } } # this also works, finding the "br" vm with the right version, but I t +hink I need the matrix, to identify the "sns" vm my @wanted_vm = ($doc->findnodes('//vm[@type=\'br\' and ./release/@ve +rsion=\'7.2.1\']')); if( @wanted_vm ) { print "found br ", $wanted_vm[0]->findvalue('./release/@version'), + "\n"; } # this gets the 'br' matrices, but why doesn't it get the vm attribute +s? foreach my $matrix ($doc->findnodes( '//matrix/vm[@type=\'br\']' )) { if( $matrix ) { #print "matrix:", $matrix->toString(), "\n"; print "vm release version:", $matrix->findvalue('./vm/release/ +@version'), "\n"; my $vm_type = $matrix->findvalue('./vm/@type'); print "vm type:'" . $vm_type . "'" . "\n"; } }

Given that 'matrix' is parent of 'vm', can some kind, wiser monk please say why xpath './release/@version' of a vm node works, but './vm/release/@version' xpath from a matrix node does not? (I'm not certain this is the right terminology, but hope it's understandable.)

I haven't got as far as changing the "sns" vm's data yet. There will be two things to do: (i) change the value of the sns vm's 'release version' attribute; and (ii) either change the '//supported/fixed' element's text, or if there are '//supported/max' and '//supported/min' elements they must be removed and replaced by a '//supported/fixed' element with given text. Any gestures in those directions (especially on how to add the 'fixed' element) will also be very helpful and much appreciated.

Replies are listed 'Best First'.
Re: xpath problem using XML::LibXML
by kcott (Archbishop) on Apr 12, 2014 at 07:32 UTC

    G'day anadem,

    You appear to have gotten close. You give the impression that you were just trying all the variations you could think of: one more and you probably would have been there. However, that's not a very efficient way to code: it's better to understand what you're doing than continually throwing pieces of code at a problem until one of them turns out to be right.

    Here's the latest "W3C XPath" documentation. The "Path Expressions" section has full details but I usually find that just the "Abbreviated Syntax" subsection suffices in most cases.

    In your first code fragment you needed this path for findnodes():

    '//matrix/vm[@type="br"]'

    Had you done that, I believe you would have got what you wanted.

    All of your attempts had either the findnodes() or the findvalue() path right but the other one wrong.

    The XML::LibXML documentation should help you with your outstanding tasks.

    -- Ken

      Thank you Ken for the pointers to the docs. Your assessment of my trying variations is close; I'm a slow learner, new to xml (and not strong on perl), and was getting desperate. That documentation will be my friend for the remaining tasks!

      Fwiw the change of quoting you suggested doesn't actually fix my mistake. My error was misunderstanding of the result that xpath '//matrix/vm@type="br"' matches, so as pointed out in the comment above I was wrongly adding an extra 'vm' child level in findvalue("./vm/release/\@version").

        "Fwiw the change of quoting you suggested doesn't actually fix my mistake."

        My apologies if the change in quoting was misinterpreted as part of the fix.

        The change from //vm to //matrix/vm was the fix.

        You had multiple instances of \' and \@ (within double-quoted strings) throughout your post: the quoting change was a cure for backslashitis. :-)

        Here's the test I ran prior to posting:

        #!/usr/bin/env perl -l use strict; use warnings; use XML::LibXML; my $file = 'pm_1082058_n4000-small.xml'; my $parser = XML::LibXML->new(); my $doc = $parser->parse_file($file); for my $vm ($doc->findnodes('//matrix/vm[@type="br"]')) { my $version = $vm->findvalue('./release/@version'); print 'Version: ', $version; print 'Wanted: ', $vm->findvalue('./release/@wanted'); if ($version eq '7.2.1') { print 'Found!'; last; } }

        Output:

        Version: 7.2.2 Wanted: Version: 7.2.1 Wanted: YES Found!

        As you can see, there's no backslashes here at all (which I find makes the code more readable). The '//matrix/vm[@type="br"]' solution which I posted in my original reply was copied directly from that. If you want to run that test (or modifiy it for other tests), the data I used (pm_1082058_n4000-small.xml) is in the spoiler below.

        Contents of pm_1082058_n4000-small.xml:

        <formation name="stoneridge" version="1.6"> <block> <matrix> <vm type="sns"> <release version="8.5.e"> <supported> <fixed>123.1106</fixed> </supported> </release> </vm> <vm type="br"> <release version="7.2.2"> </release> </vm> </matrix> <notmatrix> <vm type="sns"> <release version="4.1.e"> <supported> <min>124.1306</min> <max>124.1500</max> </supported> </release> </vm> <vm type="br"> <release version="7.2.1" wanted="NO"> </release> </vm> </notmatrix> <matrix> <vm type="sns"> <release version="4.1.e"> <supported> <min>124.1306</min> <max>124.1500</max> </supported> </release> </vm> <vm type="br"> <release version="7.2.1" wanted="YES"> </release> </vm> </matrix> </block> </formation>

        -- Ken

Re: xpath problem using XML::LibXML
by Anonymous Monk on Apr 12, 2014 at 06:24 UTC

    advice 1) perlquote + xpather.pl

    advice 2) line up your xpaths by / and study the xpather full paths :) like this

    #!/usr/bin/perl -- use strict; use warnings; use XML::LibXML; my $dom = XML::LibXML->new( qw/ recover 2 / )->load_xml( location => 'pm1082058.xml', ); for my $matrix ( $dom->findnodes( q{ //matrix/vm[@type='br'] } )) { print $matrix->nodePath, "\n"; print "\n$matrix\n\n"; #~ for my $version ( $matrix->findnodes('./vm/release/@version') ) +{ for my $version ( $matrix->findnodes('./release/@version') ){ print $version->nodePath, "\n"; print "$version\n"; } print "\n\n"; } __END__ /formation/block/matrix[1]/vm[2] <vm type="br"> <release version="7.2.2"> </release> </vm> /formation/block/matrix[1]/vm[2]/release/@version version="7.2.2" /formation/block/matrix[2]/vm[2] <vm type="br"> <release version="7.2.1"> </release> </vm> /formation/block/matrix[2]/vm[2]/release/@version version="7.2.1"

    As you can see vm doesn't have vm children

    //matrix/vm[@type='br'] ./vm/release

      Thank you kind Anonymous Monk for the excellent pieces of advice, which I shall take to heart. And thank you for pointing out my error and its solution.

      Quoting is one of my may weak points in perl, which I'll correct. The nodePath method will be very useful if I have to work on xml again.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1082058]
Approved by kcott
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others meditating upon the Monastery: (1)
As of 2024-04-16 21:36 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found