Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery

XML data reading/output

by Anonymous Monk
on Sep 04, 2009 at 18:34 UTC ( #793543=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I need to review a lot of xml files and find the values of tag <mainterm type=??? > when the the '<descriptors> value is "CTC".

See one record in my xml file:




<copyright type="Els">Copyright 2007 AAA, All rights reserved.</copyright>




<citation-type code="ar"/>



<descriptors controlled="y" type="CCV">


<mainterm weight="a">Corrosion protection</mainterm>



<descriptors controlled="y" type="CMH">

<mainterm weight="a">Corrosion inhibitors</mainterm>



<descriptors controlled="y" type="CTC">


<mainterm weight="a">G</mainterm>








The output should be:

'<mainterm weight="a">G</mainterm>'

That means only when '<descriptors type="CTC"' then print the '<mainterm>' value.

Can you help me?

Replies are listed 'Best First'.
Re: XML data reading/output
by toolic (Bishop) on Sep 04, 2009 at 19:16 UTC
    XML::Twig can help you:
    use strict; use warnings; use XML::Twig; my $xfile = shift; my $t = new XML::Twig( twig_handlers => {descriptors => \&desc} ); $t->parsefile($xfile); sub desc { my ($twig, $desc) = @_; if ($desc->att('type') eq 'CTC') { $desc->first_child('descriptor')->first_child('mainterm')->pri +nt(); } }

    Please post valid XML.

      Thanks so much for the help.<\p>
      My xml file is too big to run. It has out of memory error now. My file size is about 1G.
      Please help.

Re: XML data reading/output
by ikegami (Pope) on Sep 04, 2009 at 19:11 UTC
    In other words, you want to dump the elements that match XPath

    Using XML::LibXML, the syntax is something like

    for my $mainterm ( $doc->findnodes( '//descriptors[@type="CTC"]/descriptor/mainterm' +) ) { print $mainterm->toString(); }

    XML::Twig would also be awesome here.

    Update: Changed
    as per reply.

      I got better results with
      $doc->findnodes( '//descriptors[@type="CTC"]/descriptor/mainterm/text( +)'
        The three changes I added are:
      • an @ for the type attribute
      • the /descriptor element (although the xml is not well formed)
      • /text() since it appears they desire a text node
        arg, yeah, dumb mistakes. But not the third one. Contrary to your claims, the OP wants the whole element ('<mainterm weight="a">G</mainterm>'), not just the text ('G').
Re: XML data reading/output
by ramrod (Curate) on Sep 04, 2009 at 19:14 UTC
Re: XML data reading/output
by arun_kom (Monk) on Sep 04, 2009 at 19:12 UTC
    Show us your code ... what have you tried till now and where did you get stuck?

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://793543]
Approved by broomduster
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others exploiting the Monastery: (4)
As of 2021-11-28 21:26 GMT
Find Nodes?
    Voting Booth?

    No recent polls found