XML::LibXML getElementsById problem

pmc has asked for the wisdom of the Perl Monks concerning the following question:

XML::LibXML looks to be an awesome combination of XPath & DOM APIs but I am having trouble finding nodes by id. Here is the example that shows my problem. It should (I believe) print a reference to the 'aaa' node, but it does not. Since finding nodes by id is so important I find it hard to belive it is a bug. What am I missing? By the way I am using the latest version of XML::LibXML (1.58)

#!/usr/bin/perl
use strict;
use XML::LibXML;
my $xml_string = <<EOF;
<?xml version="1.0"?> 
<root> 
  <aaa id='test'> 
    <bbb/> 
  </aaa> 
</root> 
EOF
my $parser = XML::LibXML->new();
my $doc = $parser->parse_string($xml_string) || die;
my $elem = $doc->getElementsById('test');
print STDERR $elem."\n";
[download]

Comment on XML::LibXML getElementsById problem Download Code

Replies are listed 'Best First'.
Re: XML::LibXML getElementsById problem by mirod (Canon) on Dec 15, 2005 at 16:11 UTC
There is nothing magical about an attribute named '`id`'. You have to tell the system that it is of type... '`ID`', either by using a DTD (you could probably also use a RelaxNG schema), or by using '`xml:id`', which IS magical, instead of just '`id`'.	[reply]
Re^2: XML::LibXML getElementsById problem by pmc (Initiate) on Dec 15, 2005 at 16:25 UTC
Thanks for the tip. This snippet works! I'm using XML::LibXML to parse HTML docs. Unfortuantely it does not treat HTML ids like xml:id. I'm pretty new to XML. Thanks again. `use strict; use XML::LibXML; my $xml_string = <<EOF; <?xml version="1.0"?> <root> <aaa xml:id='test'> <bbb/> </aaa> </root> EOF my $parser = XML::LibXML->new(); my $doc = $parser->parse_string($xml_string) \|\| die; my $elem = $doc->getElementsById('test'); print STDERR $elem."\n";` [download]	[reply] [d/l]
Re^3: XML::LibXML getElementsById problem by mirod (Canon) on Dec 15, 2005 at 16:39 UTC
Why don't you use a regular XPath expression instead of `getElementsById`? `my $elem = ($doc->findnodes('//*[@id="test"]'))[0];` works fine. It is probably slower than using `getElementsById` but it might not matter. Or you could select all elements with the attribute `id` and replace it by `xml:id`, and hope (I would think it works) that `getElementsById` then works. Or you could pre-process your HTML using `tidy` for example to get XHTML, and then use XML::LibXML on the XHTML (you might need to set the option to process the DTD in order for `id` to be recognized as an ID). There might also be an XML::LibXML specific trick for this, but I don't know the module that well.	[reply] [d/l]
Re: XML::LibXML getElementsById problem by santonegro (Scribe) on Dec 15, 2005 at 17:42 UTC
Wow, thank god for XML::TreeBuilder `my $tree = XML::TreeBuilder->new_from_content(<<EOF); ?xml version="1.0"?> <root> <aaa id='test'> <bbb/> </aaa> </root> EOF my $elem = $tree->look_down(id => 'text'); $elem->this; $elem->that;` [download]	[reply] [d/l]
Re^2: XML::LibXML getElementsById problem by mirod (Canon) on Dec 15, 2005 at 19:19 UTC
Oh, if we are pimping alternate modules, then of course `id` IS magical in XML::Twig: `#!/usr/bin/perl -w use strict; use XML::Twig; my $xml_string = <<EOF; <?xml version="1.0"?> <root> <aaa id='test'> <bbb/> </aaa> </root> EOF my $t= XML::Twig->nparse( $xml_string); my $elem= $t->getElementById( 'test'); $elem->print;` [download] And of course you can use it to process HTML too (it sub-contracts the HTML to XHTML conversion to HTML::TreeBuilder): `#!/usr/bin/perl -w use strict; use XML::Twig; my $html_string = <<EOF; <html> <head><title>Just a quick example</title></head> <body><h1>Example</h1> an example<p> <div id="test">gotcha!</div> <hr> </html> EOF my $t= XML::Twig->new->parse_html( $html_string); my $elem= $t->getElementById( 'test'); $elem->print;` [download]	[reply] [d/l] [select]


Think about Loose Coupling
	PerlMonks