Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

Re: Parsing two XML at the same time and align them

by choroba (Cardinal)
on May 17, 2018 at 12:00 UTC ( [id://1214731]=note: print w/replies, xml ) Need Help??


in reply to Parsing two XML at the same time and align them

Note that the XML chunk you posted is not a well-formed XML, as it lacks a root node. I wrapped it into
<root> ... </root>

and used XML::LibXML to get the desired output:

#!/usr/bin/perl use warnings; use strict; use feature qw{ say }; use XML::LibXML; my @files = @ARGV[0, 1]; my %extracted; for my $xml_file (@files) { my $dom = 'XML::LibXML'->load_xml(location => $xml_file); for my $file ($dom->findnodes('/root/file')) { my $original = $file->{original}; for my $unit ($file->findnodes('body/unit')) { my $id = $unit->{id}; my $title = $unit->findvalue('title'); $extracted{$original}{$id}{$xml_file} = $title; } } } for my $file (keys %extracted) { for my $id (keys %{ $extracted{$file} }) { say join "\t", $file, $id, @{ $extracted{$file}{$id} }{@files} +; } }

($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord }map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,

Replies are listed 'Best First'.
Re^2: Parsing two XML at the same time and align them
by corfuitl (Sexton) on May 18, 2018 at 15:48 UTC
    <>Thank you so much for your help and I apologize for my late response.

    I tested and it works! However, since my XML uses some namespaces it is not possible to parse the file as is but I need to replace them. In header, it has

    <... xmlns:oka="ok-fram:xml-extensions" ...>

    and some elements start with oka i.e. (oka:inputEncoding="US-ASCII")

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1214731]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others sharing their wisdom with the Monastery: (7)
As of 2024-03-28 21:37 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found