Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

Apache +XML parsing

by soliplaya (Beadle)
on Nov 08, 2008 at 16:44 UTC ( [id://722416]=perlquestion: print w/replies, xml ) Need Help??

soliplaya has asked for the wisdom of the Perl Monks concerning the following question:

Eminences, I beg audience related to the following theme :
I am writing a (perl) cgi-bin script to run under mod_perl and thus persistently, said script being repeatedly called upon to parse XML data. The data consists, each time, of a published document description, estimated to be between 2 and 10 Kb.
After parsing, I need to be able to extract most tags and attribute values, to pass them to some other software which does not understand XML.
Coming from a reputed source, I do not expect many issues with the XML per se.
But having had some problems before with memory leaks and/or performance in some perl XML modules, I am asking for your benign recommendations as to what works resonably fast, repeatedly and safely under mod_perl, without swelling the server's memory footprint too much.

Thank you in advance.

Replies are listed 'Best First'.
Re: Apache +XML parsing
by jeffa (Bishop) on Nov 08, 2008 at 17:07 UTC
        Thanks Brothers.
        Jenda, your XML::Rules module look interesting, and I'd like to give it a try.
        What I need to do is fairly simple and boring : I need to parse a multi-level XML document, contained in the scaler $xmldoc, representing a Journal Article (*), into a simple hash like
        my $href = { 'TI' => [ 'content of <PubArticle><Article><Title> tag' ], 'AU' => [ 'content of <PubArticle><Article><Authors><Author name="au +thor1" tag', 'content of <PubArticle><Article><Authors><Author name="au +thor2" tag', etc.. ], 'REF' => [ and so on... ] };

        The end-result I want thus, is a hash in which each key corresponds to an arrayref, the array containing one or more string elements, these being picked up from tag attributes and/or values from the original XML document. I admit I am a bit lost after the first read of the on-line doc. I guess what I don't see very clearly, from the first example at the head of the doc, is how I get the result in my $href hash.
        (*) for a full example of the source XML, use this link : http://www.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&retmode=xml&id=18632282
        XML::Rules is sax (maybe both), and XML::Twig is definetly both.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://722416]
Approved by marto
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (1)
As of 2024-04-19 00:39 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found