http://qs321.pair.com?node_id=793543

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I need to review a lot of xml files and find the values of tag <mainterm type=??? > when the the '<descriptors> value is "CTC".

See one record in my xml file:


<item>

<bibrecord>

<item-info>

<copyright type="Els">Copyright 2007 AAA, All rights reserved.</copyright>

</item-info>

<head>

<citation-info>

<citation-type code="ar"/>

</citation-info>

<descriptorgroup>

<descriptors controlled="y" type="CCV">

<descriptor>

<mainterm weight="a">Corrosion protection</mainterm>

</descriptor>

</descriptors>

<descriptors controlled="y" type="CMH">
<br<descriptor>

<mainterm weight="a">Corrosion inhibitors</mainterm>

</descriptor>

</descriptors>

<descriptors controlled="y" type="CTC">

<descriptor>

<mainterm weight="a">G</mainterm>

</descriptor>

</descriptors>

</descriptorgroup>

</enhancement>

</head>

</bibrecord>

</item>

The output should be:

'<mainterm weight="a">G</mainterm>'

That means only when '<descriptors type="CTC"' then print the '<mainterm>' value.

Can you help me?

Replies are listed 'Best First'.
Re: XML data reading/output
by ikegami (Patriarch) on Sep 04, 2009 at 19:11 UTC
    In other words, you want to dump the elements that match XPath
    //descriptors[@type="CTC"]/descriptor/mainterm

    Using XML::LibXML, the syntax is something like

    for my $mainterm ( $doc->findnodes( '//descriptors[@type="CTC"]/descriptor/mainterm' +) ) { print $mainterm->toString(); }

    XML::Twig would also be awesome here.

    Update: Changed
    //descriptors[type="CTC"]/mainterm
    to
    //descriptors[@type="CTC"]/descriptor/mainterm
    as per reply.

      I got better results with
      $doc->findnodes( '//descriptors[@type="CTC"]/descriptor/mainterm/text( +)'
        The three changes I added are:
      • an @ for the type attribute
      • the /descriptor element (although the xml is not well formed)
      • /text() since it appears they desire a text node
        arg, yeah, dumb mistakes. But not the third one. Contrary to your claims, the OP wants the whole element ('<mainterm weight="a">G</mainterm>'), not just the text ('G').
Re: XML data reading/output
by toolic (Bishop) on Sep 04, 2009 at 19:16 UTC
    XML::Twig can help you:
    use strict; use warnings; use XML::Twig; my $xfile = shift; my $t = new XML::Twig( twig_handlers => {descriptors => \&desc} ); $t->parsefile($xfile); sub desc { my ($twig, $desc) = @_; if ($desc->att('type') eq 'CTC') { $desc->first_child('descriptor')->first_child('mainterm')->pri +nt(); } }

    Please post valid XML.

      Thanks so much for the help.<\p>
      My xml file is too big to run. It has out of memory error now. My file size is about 1G.
      Please help.

Re: XML data reading/output
by ramrod (Curate) on Sep 04, 2009 at 19:14 UTC
Re: XML data reading/output
by arun_kom (Monk) on Sep 04, 2009 at 19:12 UTC
    Show us your code ... what have you tried till now and where did you get stuck?