Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

Re: XML::LibXML- Escape Empty Tags

by Your Mother (Archbishop)
on Jul 30, 2009 at 16:36 UTC ( [id://784669]=note: print w/replies, xml ) Need Help??


in reply to XML::LibXML- Escape Empty Tags

This might help get you going. I took out the file stuff, you'll have to adjust. If this is something you actually need for work, you might consider posting it as a one-off job to jobs.perl.org or something.

use strict; # Don't leave out! use warnings; # Don't leave out! use XML::LibXML; my $parser = XML::LibXML->new(); my $doc = $parser->parse_fh(\*DATA); my @product = $doc->getElementsByTagName('product'); for my $kid ( @product ){ print join("\t", $kid->getElementsByTagName('name')->[0]->textContent, $kid->getElementsByTagName('imageURL_med')->[0]->textCont +ent, $kid->getAttribute('category_id'), $kid->getAttribute('id'), $kid->getElementsByTagName('desc_short')->[0]->textConten +t, ), "\n"; } # print $doc->serialize(); __END__ <root> <product category_id="13296" id="675936193" catalog="false" row="1"> <name>Children's Hand Rake</name> <imageURL_med></imageURL_med> <desc_short>Mini gardeners can dig, rake and scoop out their own p +lot with this children's hand rake, complete with contoured handles a +nd durable metal heads.</desc_short> </product> <product category_id="13296" id="675936193" catalog="false" row="1"> <name>Bag of Broken Glass</name> <imageURL_med>http://moocow.co.uk.jp/something/something/bg.jpg</i +mageURL_med> <desc_short>Fun for all ages!</desc_short> </product> </root>

Replies are listed 'Best First'.
Re^2: XML::LibXML- Escape Empty Tags
by khalistoo (Initiate) on Jul 31, 2009 at 08:46 UTC
    Thanks a lot, this seems to work. However, can you explain to me two things, the line
    # print $doc->serialize();
    As i ve got no idea about what it is suppose to actually do. and the use of my $doc = $parser->parse_fh(\*DATA);. I guess this is to work with filehandle but i was under the impression that parse_file was much better for big file manipulation (since i am in fact using to parse some 600 meg XML...), but then again, thanks a lot, that s going on very well. Cheers everyone for the help

      The serialize is there to uncomment if you want it to dump the doc to check. And you're right, doing the file directly (no filehandle) is probably faster. The *DATA handle is just easy to test/demo because it lets you put the data into the test script. Good luck. It is worth the effort to continue to pick up some Perl. It's not that hard, you'll get great help here and on many lists, and it can boost productivity in a menagerie of tasks.

      It is a comment, it does nothing :D

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://784669]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others rifling through the Monastery: (7)
As of 2024-04-25 11:08 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found