Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

Namespaced XML::LibXML XPath query

by diotalevi (Canon)
on Feb 15, 2006 at 20:45 UTC ( [id://530519]=perlquestion: print w/replies, xml ) Need Help??

diotalevi has asked for the wisdom of the Perl Monks concerning the following question:

I am trying to perform a normal query into an XML document where something has been declared about the default namespace. If the xmlns="..." part of the data is removed then this test succeeds. What needs to change in my XPath query //lastName or XML::LibXML parser object so that this query will succeed?

use Test::More tests => 2; use XML::LibXML (); { my $xml = <<'XML'; <?xml version="1.0" standalone="yes"?> <sdnList> <sdnEntry> <lastName>Hello world!</lastName> </sdnEntry> </sdnList> XML my $parser = XML::LibXML->new; my $doc = $parser->parse_string($xml); my $result = $doc->findvalue('//lastName'); is( $result, "Hello world!", "No namespace" ); } { my $xml = <<'XML'; <?xml version="1.0" standalone="yes"?> <sdnList xmlns="http://tempuri.org/sdnList.xsd"> <sdnEntry> <lastName>Hello world!</lastName> </sdnEntry> </sdnList> XML my $parser = XML::LibXML->new; my $doc = $parser->parse_string($xml); my $result = $doc->findvalue('//lastName'); is( $result, "Hello world!", "Namespace" ); }

Results

1..2 ok 1 - No namespace not ok 2 - Namespace # Failed test 'Namespace' # in sdl2.pl at line 30. # got: '' # expected: 'Hello world!' # Looks like you failed 1 test of 2.

Replies are listed 'Best First'.
Re: Namespaced XML::LibXML XPath query
by diotalevi (Canon) on Feb 15, 2006 at 22:09 UTC

    I believe I have solved this. It is a bug in either the xml parser that XML::LibXML uses or XML::LibXML. When a namespace declaration doesn't specify a prefix, the prefix used is the containing element name. For my example code, the prefix should be sdnList. XML::LibXML is of the incorrect opinion that sdnList isn't a valid namespace. My query should have been written as //sdnList:lastName.

    ⠤⠤ ⠙⠊⠕⠞⠁⠇⠑⠧⠊

      The behaviour you saw is absolutely correct and not a bug at all. To quote the author of libxml2 from a message aptly titled Re: [xml] XPath and default namespaces (bet you're sick of this by now :) ):

      You cannot define a default namespace for XPath, period, don't try you can't, the XPath spec does not allow it. This can't work and trying to add it to libxml2 would simply make it non conformant to the spec.

      In a nutshell forget about using default namespace within XPath expressions, this will *never* work, you *can't* !

      Google [daniel veillard default namespace xpath] if you want more.

      As he says, XPath has no notion of a default namespace. //lastName in an XPath expression always matches that element in the null namespace, not the default namespace. According to the spec:

      A QName in the node test is expanded into an expanded-name using the namespace declarations from the expression context. This is the same way expansion is done for element type names in start and end-tags except that the default namespace declared with xmlns is not used: if the QName does not have a prefix, then the namespace URI is null (this is the same way attribute names are expanded).

      In //sdnList:lastName, sdnList is not a namespace. Only URIs can be namespaces. The stuff in front of the colon is the prefix, and is merely a stand-in for the URI. <sdnList xmlns="http://tempuri.org/sdnList.xsd"> puts the sdnList element (and all its prefix-less descendants) in the http://tempuri.org/sdnList.xsd namespace. You have to associate this URI with a prefix, then use the prefix in your expression. This is exactly the approach lestrrat posted:

      my $xc = XML::LibXML::XPathContext->new( $doc->documentElement() ); $xc->registerNs( foobar => 'http://tempuri.org/sdnList.xsd' ); my $result = $xc->findvalue( '//foobar:lastName' );

      I wrote about this a while ago.

      Note that the prefix is arbitrary and has nothing to do with what appears in your document. This is as it should be, because the following document means exactly the same as the one you have:

      <camel:sdnList xmlns:camel="http://tempuri.org/sdnList.xsd"> <camel:sdnEntry> <camel:lastName>Hello world!</camel:lastName> </camel:sdnEntry> </camel:sdnList>

      For that matter, even this means the same:

      <camel:sdnList xmlns:camel="http://tempuri.org/sdnList.xsd"> <penguin:sdnEntry xmlns:penguin="http://tempuri.org/sdnList.xsd"> <camel:lastName>Hello world!</camel:lastName> </penguin:sdnEntry> </camel:sdnList>

      Or this:

      <sdnList xmlns="http://tempuri.org/sdnList.xsd"> <penguin:sdnEntry xmlns:penguin="http://tempuri.org/sdnList.xsd"> <lastName>Hello world!</lastName> </penguin:sdnEntry> </sdnList>

      You get the idea.

      Makeshifts last the longest.

        Hi Monks, I must be missing something simple. Could you please help me grasp this concept...
        Take the following example xml:
        <aaa xmlns="xmlapi_1.0"> <bbb> <ccc> <d1>blah</d1> <d2>blah</d2> <d3>blah</d3> </ccc> <ccc> <d1>blah</d1> <d2>blah</d2> <d3>blah</d3> </ccc> </bbb> </aaa>
        I need to iterate through each <ccc>. I worked out how to get the list of <ccc> nodes and this thread confirms what I did as correct. But now that I have the <ccc> node, how do I get the <dx> properties? I've tried with and without the namespace already defined but still no love. It gets worse, the xml I receive could have <e> nested in <d>.
      I've just checked the specifications of how it should be and it is, indeed, a bug. Although I don't know if it's a libxml2 bug or a bug in the Perl bindings to it (i.e. XML::LibXML).

      Either way, you should report it to the authors. But I don't know if it's still maintained, since the last update happened in 2004.


      acid06
      perl -e "print pack('h*', 16369646), scalar reverse $="

        I reported this to rt.cpan.org as soon as I found that it was a bug.

        ⠤⠤ ⠙⠊⠕⠞⠁⠇⠑⠧⠊

Re: Namespaced XML::LibXML XPath query
by lestrrat (Deacon) on Feb 16, 2006 at 12:19 UTC

    This is probably what you want:

    use XML::LibXML; use XML::LibXML::XPathContext; { my $xml = <<'XML'; <?xml version="1.0" standalone="yes"?> <sdnList xmlns="http://tempuri.org/sdnList.xsd"> <sdnEntry> <lastName>Hello world!</lastName> </sdnEntry> </sdnList> XML my $parser = XML::LibXML->new; my $doc = $parser->parse_string($xml); my $xc = XML::LibXML::XPathContext->new($doc); $xc->registerNs('sdnList', 'http://tempuri.org/sdnList.xsd'); my $result = $xc->findvalue('//sdnList:lastName'); is( $result, "Hello world!", "Namespace" ); }
      So what you're saying is my xpath needs to be written like this:
      $doc->findnodes('//myns:root/myns:stuff/myns:items/myns:book');
      Given the following:
      <root xmlns="http://goobar/xml" xmlns:myns="http://goobar/xml"> <stuff> <items> <book>XML Namespaces</book> </items> </stuff> </root>
      That is really an ugly xpath expression.
        blame the XML guys ;) default namespaces are just plain stupid.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://530519]
Approved by astaines
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others admiring the Monastery: (4)
As of 2024-04-19 16:51 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found