Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

XML::LibXML findnodes and namespaces

by shamu (Acolyte)
on Mar 28, 2007 at 05:16 UTC ( #606909=perlquestion: print w/replies, xml ) Need Help??

shamu has asked for the wisdom of the Perl Monks concerning the following question:

Hey Monks,

I'm trying to migrate some code from XML::Twig to XML::LibXML for it's performance advantages. I've been trying with great difficulty to use XML::LibXML to parse dynamically created XML in which the element namespaces are not known prior to parsing. I don't care what the default namespaces are, as long as they don't get in the way.

I'm attempting to use findnodes with the recommended (from perlmonks) XML::LibXML::XPathContext module for dealing with the namespaces. Using registerNs doesn't appear to have any visible effect when printing the XML as a string, so I use $node->setNamespace to accomplish that.

my $parser = XML::LibXML->new(); my $doc = $parser->parse_file($file); my $t = $doc->getDocumentElement; foreach my $node ($t->findnodes('TransactionList/Transaction/XML')) { my $xml = $node->firstChild; my $xc = XML::LibXML::XPathContext->new($xml); foreach (qw( summary histFile fooScore )) { $xc->registerNs($_,"urn:$_"); $node->setNamespace("urn:$_",$_,0); } my $path = '//fooResponse/info/fooTransaction/transactionDetail/histFi +le:transactionSummary/summary:PacManScore'; my @nodes = $xc->findnodes($path);
I've read the two related posts to this topic: 555011 242028

...but I'm still stuck. It appears that when I encounter an element with a default namespace, findnodes refuses to match it without registering some prefix to the URI. I just want to be able to retrieve the nodes using an XPath like:

//fooResponse/info/fooTransaction/transactionDetail/histFile:transacti +onSummary/summary:PacManScore

And if I want to create new elements with arbitrary namespaces, I don't want to have to manage the namespaces at every turn.

Why does this work?
//*/info/fooTransaction/transactionDetail/*
But not this?
//*/info/fooTransaction/transactionDetail/histFile:transactionSummary/ +*

Please help! :)

I have an XML file like:
<Extract> <Service> <Name>FooService</Name> <Type>Response</Type> </Service> <TransactionList> <Transaction> <Tran_Id>90dc-2f633156cbf6</Tran_Id> <XML> <fooResponse xmlns="http://some.arbitrary.url/" xmlns:ns="http +://another.arbitrary.url/" xmlns:xsi="http://www.w3.org/2001/XMLSchem +a-instance"> <info xmlns=""> <servicePreferences> <ns:requestingCode xsi:nil="true"/> </servicePreferences> <fooTransaction xmlns:summary="foo"> <transactionDetail> <histFile:transactionSummary xmlns:histFile="http://ye +t.another.arbitrary.url/"> <summary:PacManScore>95.133</summary:PacManScore> <summary:QBertScore>95.133</summary:QBertScore> <summary:FroggerScore>95.133</summary:FroggerScore> </histFile:transactionSummary> </transactionDetail> </fooTransaction> </info> </fooResponse> </XML> </Transaction> </TransactionList> </Extract>

Replies are listed 'Best First'.
Re: XML::LibXML findnodes and namespaces
by roman (Monk) on Mar 29, 2007 at 10:32 UTC

    I am not completely clear about your intention and may repeat something you already know:

    XML::LibXML::XPathContext->registerNs doesn't modify the context node (document) in any way, it only maps the prefixes to namespaces. This mapping has nothing common with namespace mapping in any element.

    When looking for nodes via:

    $xc->findnodes('//*/info/fooTransaction/transactionDetail/histFile:tra +nsactionSummary/*')
    the namespace qualified element names are compared. In xpath context the histFile:transactionSummary is mapped to <urn:histFile>transactionSummary, while the fully qualified name of histFile::transactionSummary element in your document is <http://yet.another.arbitrary.url/>transactionSummary, so they don't match.

    In XPath you can also look by local or qualified name of the element.

    # looking by local-name $xc->findnodes('//*/info/fooTransaction/transactionDetail/*[local-name +() = "transactionSummary"]/*'); # looking by qualified name $xc->findnodes('//*/info/fooTransaction/transactionDetail/*[name() = " +histFile:transactionSummary"]/*');
    In this case you can call findnodes method on any node, you don't need the XML::LibXML::XPathContext with its prefix => namespace mapping:
    $doc->findnodes('//*/info/fooTransaction/transactionDetail/*[name() = +"histFile:transactionSummary"]/*');

      What I'm really trying to do is use an XPath in a findnodes call without any concern or regard for namespaces.

      Using XML::LibXML::XPathContext, I was able to create a virtual mapping to my fooResponse element...I was pointing to the wrong node.

      I still would like to be able to use an XPath like:

      fooResponse/info/fooTransaction/transactionDetail/histFile:transactionSummary/summary:PacManScore

      ...without having to traverse the DOM and create/update namespaces along the way.

      I have a big list of XPath elements that I want to use with findnodes, hopefully without having to modify them to support "local-name" and such.

      Is what I'm asking that difficult to accomplish?

        I think without modification to your XPath, the task is not only difficult, but impossible. I don't see any possibility how plain fooResponse in your XPath expression could match with namespace qualified fooResponse element in your document.

        If you want to use XPath and match names regardless namespaces, the name() test is the only solution I can find. The original XPath expressions can be turned to name() variant programmatically (below is my first try).

        sub name_matching_xpath { my ($xpath) = @_; return join '/', map { /^([a-z0-9:-]+)(.*)$/i ? "*[name() = '$1']$2" : $_; } sp +lit '/', $xpath; }

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://606909]
Approved by ikegami
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chilling in the Monastery: (5)
As of 2022-01-19 17:22 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    In 2022, my preferred method to securely store passwords is:












    Results (55 votes). Check out past polls.

    Notices?