Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

facing problem in parsing xml

by Priti24 (Novice)
on Jul 29, 2013 at 05:40 UTC ( [id://1046774]=perlquestion: print w/replies, xml ) Need Help??

Priti24 has asked for the wisdom of the Perl Monks concerning the following question:

i have a variable having xml data in it.

$xml = "<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="/templates/xsl/abc/search/searc +hRetrieveResponse.xsl"?><searchRetrieveResponse xmlns="http://www.abc +/srw/"> <version>1.1</version> <numberOfRecords>14135</numberOfRecords> <records> <record> <recordSchema>info:srw/schema/1/dc-v1.1</recordSchema> <recordPacking>xml</recordPacking> <recordData> <srw_dc:dc xmlns:srw_dc="info:srw/schema/1/dc-v1.1" xmlns:dc="http://purl.org/dc/elements/1.1/"> <dc:identifier>ISSN: 00322032</dc:identifier> <dc:identifier>URL: http://www.jstor.org/stable/2306831 +5</dc:identifier> <dc:title>TEST</dc:title> <dc:creator>KAY RYAN</dc:creator> <dc:relation>Poetry, Vol. 176, No. 3</dc:relation> <dc:coverage>p. 126</dc:coverage> <dc:rights>Copyright 2000 Poetry Foundation</dc:rights> <dc:publisher>Poetry Foundation</dc:publisher> <dc:date>2000-06-01</dc:date> <dc:type>FLA</dc:type> <dc:language>eng</dc:language> </srw_dc:dc> </recordData> <recordPosition>1</recordPosition> </record> <record> <recordSchema>info:srw/schema/1/dc-v1.1</recordSchema> <recordPacking>xml</recordPacking> <recordData> <srw_dc:dc xmlns:srw_dc="info:srw/schema/1/dc-v1.1" xmlns:dc="http://purl.org/dc/elements/1.1/"> <dc:identifier>ISSN: 0010096X</dc:identifier> <dc:identifier>URL: http://www.jstor.org/stable/357303< +/dc:identifier> <dc:title>Test</dc:title> <dc:creator>Wm. Leonard</dc:creator> <dc:relation>College Composition and Communication, Vol +. 29, No. 2</dc:relation> <dc:coverage>p. 161</dc:coverage> <dc:rights>Copyright 1978 National Council of Teachers +of English</dc:rights> <dc:publisher>National Council of Teachers of English</ +dc:publisher> <dc:date>1978-05-01</dc:date> <dc:type>FLA</dc:type> <dc:language>eng</dc:language> </srw_dc:dc> </recordData> <recordPosition>2</recordPosition> </record> </records> </searchRetrieveResponse>";

i have to parse it and also need to fetch value of title and creator, etc.I wrote a script but not able to fetch.

my $x = new XML::LibXML(); my $data = $x->parse_string($xml); my $recordData = $data->findnodes('/records/record/recordData/'); foreach my $rec(@$recordData){ print STDERR "helloooooooooooooooooooooooooooooooo"; my $title = $rec->findnodes('title')->string_value(); print STDERR $title . "\n"; }

how do I fetch url in <identifier> tag instead of ISSN as they both have the same tag name

Plesae help me out. Thanks in Advance....

Replies are listed 'Best First'.
Re: facing problem in parsing xml
by choroba (Cardinal) on Jul 29, 2013 at 07:26 UTC
    There are several problems. The first one, using " to quote the string which itself contains double quotes, has already been pointed out.

    There are more problems, though:

    1. Your XML uses namespaces. When working with XML::LibXML (or any other XML library that supports XPath, even in langugages others than Perl), you have to register the namespaces in order to be able to reference namespaced elements in XPath expressions.
    2. Even if you add namespaces, you also have to make sure your XPath expressions really describe the structure of the document. In this case, record is not the root node, so you cannot start the expression with /record. Similarly, recordData does not contain a title child, there is a srw_dc:dc in between.

    After fixing the mentioned problems, there is a code that works for me:

    #!/usr/bin/perl use warnings; use strict; use XML::LibXML; my $xml = q%<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="/templates/xsl/abc/search/searc +hRetrieveResponse.xsl"?><searchRetrieveResponse xmlns="http://www.abc +/srw/"> <version>1.1</version> <numberOfRecords>14135</numberOfRecords> <records> <record> <recordSchema>info:srw/schema/1/dc-v1.1</recordSchema> <recordPacking>xml</recordPacking> <recordData> <srw_dc:dc xmlns:srw_dc="info:srw/schema/1/dc-v1.1" xmlns:dc="http://purl.org/dc/elements/1.1/"> <dc:identifier>ISSN: 00322032</dc:identifier> <dc:identifier>URL: http://www.jstor.org/stable/2306831 +5</dc:identifier> <dc:title>TEST</dc:title> <dc:creator>KAY RYAN</dc:creator> <dc:relation>Poetry, Vol. 176, No. 3</dc:relation> <dc:coverage>p. 126</dc:coverage> <dc:rights>Copyright 2000 Poetry Foundation</dc:rights> <dc:publisher>Poetry Foundation</dc:publisher> <dc:date>2000-06-01</dc:date> <dc:type>FLA</dc:type> <dc:language>eng</dc:language> </srw_dc:dc> </recordData> <recordPosition>1</recordPosition> </record> <record> <recordSchema>info:srw/schema/1/dc-v1.1</recordSchema> <recordPacking>xml</recordPacking> <recordData> <srw_dc:dc xmlns:srw_dc="info:srw/schema/1/dc-v1.1" xmlns:dc="http://purl.org/dc/elements/1.1/"> <dc:identifier>ISSN: 0010096X</dc:identifier> <dc:identifier>URL: http://www.jstor.org/stable/357303< +/dc:identifier> <dc:title>Test</dc:title> <dc:creator>Wm. Leonard</dc:creator> <dc:relation>College Composition and Communication, Vol +. 29, No. 2</dc:relation> <dc:coverage>p. 161</dc:coverage> <dc:rights>Copyright 1978 National Council of Teachers +of English</dc:rights> <dc:publisher>National Council of Teachers of English</ +dc:publisher> <dc:date>1978-05-01</dc:date> <dc:type>FLA</dc:type> <dc:language>eng</dc:language> </srw_dc:dc> </recordData> <recordPosition>2</recordPosition> </record> </records> </searchRetrieveResponse>%; my $data = 'XML::LibXML'->load_xml(string => $xml); my $xpc = 'XML::LibXML::XPathContext'->new($data); $xpc->registerNs('srw', 'http://www.abc/srw/'); $xpc->registerNs('dc', 'http://purl.org/dc/elements/1.1/'); my $recordData = $xpc->findnodes('//srw:records/srw:record/srw:recordD +ata', $data); foreach my $rec(@$recordData){ my $title = $xpc->findnodes('.//dc:title', $rec); print $title, "\n"; }

    Update: Also note that $title->string_value is not needed if all you want to do with the title is to print it. Elements stringify to their string value.

    لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
Re: facing problem in parsing xml
by Skeeve (Parson) on Jul 29, 2013 at 06:39 UTC

    The part: $xml = "<?xml version="1.0"?> can't work, as you use ". Maybe this is your problem?

    Try to use something like this:

    my $xml=<<'XML'; <?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="/templates/xsl/abc/search/searc +hRetrieveResponse.xsl"?><searchRetrieveResponse xmlns="http://www.abc +/srw/"> <version>1.1</version> : : XML

    s$$([},&%#}/&/]+}%&{})*;#$&&s&&$^X.($'^"%]=\&(|?*{%
    +.+=%;.#_}\&"^"-+%*).}%:##%}={~=~:.")&e&&s""`$''`"e
Re: facing problem in parsing xml (perlintro, perlquote)
by Anonymous Monk on Jul 29, 2013 at 07:25 UTC

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1046774]
Approved by Skeeve
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chilling in the Monastery: (2)
As of 2024-04-26 04:19 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found