Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

Parsing two XML at the same time and align them

by corfuitl (Sexton)
on May 17, 2018 at 10:20 UTC ( #1214724=perlquestion: print w/replies, xml ) Need Help??

corfuitl has asked for the wisdom of the Perl Monks concerning the following question:

Hi PerlMonks

I have the following problem to solve and need your help!

I have two XML files (that should be identical but not always) and I would to extract some values and align them.

My XML files look like this:

<file original="File_1.xml"> <body> <unit id="id1"> <title>Part 1_file1</title> </unit> <unit id="id2"> <title>Part 2</title> </unit> </body> </file> <file original="File_2.xml"> <body> <unit id="id1"> <title>Part 1</title> </unit> </body> </file>

I would like to align them in this way:

File_1.xml id1 title_value_from_first_xml title_value_from_se +cond_xml File_1.xml id2 title_value_from_first_xml title_value_from_se +cond_xml File_2.xml id1 title_value_from_first_xml title_value_from_se +cond_xml

Any suggestions?

Replies are listed 'Best First'.
Re: Parsing two XML at the same time and align them
by choroba (Archbishop) on May 17, 2018 at 12:00 UTC
    Note that the XML chunk you posted is not a well-formed XML, as it lacks a root node. I wrapped it into
    <root> ... </root>

    and used XML::LibXML to get the desired output:

    #!/usr/bin/perl use warnings; use strict; use feature qw{ say }; use XML::LibXML; my @files = @ARGV[0, 1]; my %extracted; for my $xml_file (@files) { my $dom = 'XML::LibXML'->load_xml(location => $xml_file); for my $file ($dom->findnodes('/root/file')) { my $original = $file->{original}; for my $unit ($file->findnodes('body/unit')) { my $id = $unit->{id}; my $title = $unit->findvalue('title'); $extracted{$original}{$id}{$xml_file} = $title; } } } for my $file (keys %extracted) { for my $id (keys %{ $extracted{$file} }) { say join "\t", $file, $id, @{ $extracted{$file}{$id} }{@files} +; } }

    ($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord }map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,
      <>Thank you so much for your help and I apologize for my late response.

      I tested and it works! However, since my XML uses some namespaces it is not possible to parse the file as is but I need to replace them. In header, it has

      <... xmlns:oka="ok-fram:xml-extensions" ...>

      and some elements start with oka i.e. (oka:inputEncoding="US-ASCII")

Re: Parsing two XML at the same time and align them
by Discipulus (Abbot) on May 17, 2018 at 11:49 UTC
    Hello corfuitl,

    > Any suggestions?

    Yes! avoid XML::Simple

    I'm used with XML::Twig and you can profit twig_handler to trap your id

    You can use a hash to store results, as in:  $res{ $filename."\t".$id } = [] so during the parsing of the two files you can push there first result from file1 and then results from file2.

    L*

    There are no rules, there are no thumbs..
    Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.
Re: Parsing two XML at the same time and align them
by james28909 (Deacon) on May 17, 2018 at 19:11 UTC
    I am trying to put together something that might work for you, but I am unsure about the input data (there is only one "title value from second xml... does it need to be printed everytime?... am i missing a key piece of evidence here? xD). If you could kindly post some better examples we would all sure appreciate it. Otherwise, I am not to sure about input data. Also, are these the only elements that will be in the input data? Please elaborate the question and include better examples. ;)
A reply falls below the community's threshold of quality. You may see it by logging in.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1214724]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others surveying the Monastery: (6)
As of 2020-11-30 23:33 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?