Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

XML::LibXML getElementsById problem

by pmc (Initiate)
on Dec 15, 2005 at 15:55 UTC ( [id://516988]=perlquestion: print w/replies, xml ) Need Help??

pmc has asked for the wisdom of the Perl Monks concerning the following question:

XML::LibXML looks to be an awesome combination of XPath & DOM APIs but I am having trouble finding nodes by id. Here is the example that shows my problem. It should (I believe) print a reference to the 'aaa' node, but it does not. Since finding nodes by id is so important I find it hard to belive it is a bug. What am I missing? By the way I am using the latest version of XML::LibXML (1.58)
#!/usr/bin/perl use strict; use XML::LibXML; my $xml_string = <<EOF; <?xml version="1.0"?> <root> <aaa id='test'> <bbb/> </aaa> </root> EOF my $parser = XML::LibXML->new(); my $doc = $parser->parse_string($xml_string) || die; my $elem = $doc->getElementsById('test'); print STDERR $elem."\n";

Replies are listed 'Best First'.
Re: XML::LibXML getElementsById problem
by mirod (Canon) on Dec 15, 2005 at 16:11 UTC

    There is nothing magical about an attribute named 'id'. You have to tell the system that it is of type... 'ID', either by using a DTD (you could probably also use a RelaxNG schema), or by using 'xml:id', which IS magical, instead of just 'id'.

      Thanks for the tip. This snippet works! I'm using XML::LibXML to parse HTML docs. Unfortuantely it does not treat HTML ids like xml:id. I'm pretty new to XML. Thanks again.
      use strict; use XML::LibXML; my $xml_string = <<EOF; <?xml version="1.0"?> <root> <aaa xml:id='test'> <bbb/> </aaa> </root> EOF my $parser = XML::LibXML->new(); my $doc = $parser->parse_string($xml_string) || die; my $elem = $doc->getElementsById('test'); print STDERR $elem."\n";

        Why don't you use a regular XPath expression instead of getElementsById? my $elem = ($doc->findnodes('//*[@id="test"]'))[0]; works fine. It is probably slower than using getElementsById but it might not matter. Or you could select all elements with the attribute id and replace it by xml:id, and hope (I would think it works) that getElementsById then works. Or you could pre-process your HTML using tidy for example to get XHTML, and then use XML::LibXML on the XHTML (you might need to set the option to process the DTD in order for id to be recognized as an ID).

        There might also be an XML::LibXML specific trick for this, but I don't know the module that well.

Re: XML::LibXML getElementsById problem
by santonegro (Scribe) on Dec 15, 2005 at 17:42 UTC
    Wow, thank god for XML::TreeBuilder
    my $tree = XML::TreeBuilder->new_from_content(<<EOF); ?xml version="1.0"?> <root> <aaa id='test'> <bbb/> </aaa> </root> EOF my $elem = $tree->look_down(id => 'text'); $elem->this; $elem->that;

      Oh, if we are pimping alternate modules, then of course id IS magical in XML::Twig:

      #!/usr/bin/perl -w use strict; use XML::Twig; my $xml_string = <<EOF; <?xml version="1.0"?> <root> <aaa id='test'> <bbb/> </aaa> </root> EOF my $t= XML::Twig->nparse( $xml_string); my $elem= $t->getElementById( 'test'); $elem->print;

      And of course you can use it to process HTML too (it sub-contracts the HTML to XHTML conversion to HTML::TreeBuilder):

      #!/usr/bin/perl -w use strict; use XML::Twig; my $html_string = <<EOF; <html> <head><title>Just a quick example</title></head> <body><h1>Example</h1> an example<p> <div id="test">gotcha!</div> <hr> </html> EOF my $t= XML::Twig->new->parse_html( $html_string); my $elem= $t->getElementById( 'test'); $elem->print;

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://516988]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others cooling their heels in the Monastery: (4)
As of 2024-04-24 13:09 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found