Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

Re: XML to HashRef and then to JSON

by tangent (Parson)
on Mar 15, 2016 at 01:18 UTC ( [id://1157762]=note: print w/replies, xml ) Need Help??


in reply to XML to HashRef and then to JSON

All the above scripts are NOT properly converting the child "Emphasis"
That's not quite true because as far as the parser is concerned the "Emphasis" is a valid XML tag. You will have to do a bit of manual labour to achieve your desired output.

I couldn't find a way to get the inner content of a node without getting the node's tags as well, so needed to use a regular expression to remove them. Hopefully this will get you on your way:

use Data::Dumper; use XML::LibXML; my $xml = q| <Publisher> <UniqueDOI>978-3-642-123456</UniqueDOI> <ChapterInfo ChapterType="OriginalPaper"> <Title Language="En">Is Light Blue (<Emphasis Type="Italic">az +zurro</Emphasis>) Color Name Universal in the Italian Language?</Titl +e> </ChapterInfo> </Publisher> |; my $doc = XML::LibXML->load_xml(string => $xml); my @Publishers = $doc->findnodes('//Publisher'); for my $Publisher ( @Publishers ) { my ($ChapterInfo) = $Publisher->findnodes('ChapterInfo'); my ($Title) = $ChapterInfo->findnodes('Title'); # get the Title node as literal XML my $content = $Title->toString(); print "Title content:\n$content\n"; # remove first and last XML tags $content =~ s/^<[^>]*>(.*)<[^>]*>$/$1/; # construct the hash reference my $hash = { UniqueDOI => $Publisher->findvalue('UniqueDOI'), ChapterInfo => { ChapterType => $ChapterInfo->getAttribute('ChapterType'), Title => { Language => $Title->getAttribute('Language'), content => $content, }, }, }; print Dumper($hash); }
See XML::LibXML::Node for explanation of these methods.

Output:

Title content: <Title Language="En">Is Light Blue (<Emphasis Type="Italic">azzurro</E +mphasis>) Color Name Universal in the Italian Language?</Title> $VAR1 = { 'UniqueDOI' => '978-3-642-123456', 'ChapterInfo' => { 'ChapterType' => 'OriginalPaper' 'Title' => { 'Language' => 'En', 'content' => 'Is Light Blue (<Emphasis Type="Italic">azzur +ro</Emphasis>) Color Name Universal in the Italian Language?' }, }, };

Replies are listed 'Best First'.
Re^2: XML to HashRef and then to JSON
by dominic01 (Sexton) on Mar 15, 2016 at 05:07 UTC
    I accept your answer. What I have provided in my OP was just a sample and my XML is big and I manipulated the XML before converting it to a HashRef.
    for $TmpNode ($dom->findnodes('//Emphasis')) { $tStr = $TmpNode->toString(1); $new_node = $dom->createTextNode( "$tStr" ); $TmpNode->replaceNode($new_node); }

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1157762]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others surveying the Monastery: (2)
As of 2024-04-24 23:52 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found