Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

Swapping xml elements from an external file?

by tonyz (Novice)
on Oct 06, 2008 at 21:08 UTC ( #715648=perlquestion: print w/replies, xml ) Need Help??

tonyz has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks, This is my first foray into using xml and perl together. I found a thread about swapping xml elements, but unlike that thread, I want to swap them out from an external file. Here is what I have so far: first, an xml file with "head" elements that I want to tag up. So, I run this:
#!/usr/bin/perl -w use XML::Twig; my $file = $ARGV[0]; my $twig= new XML::Twig(TwigRoots => {head => 1}); $twig->parsefile($file); $twig->print;
which scrapes out all and only the head elements. Then I write it all to a file from the command line. That file looks something like this:
<TEI> <head>Note VII</head> <head>Title Page </head> <head>Copyright Page </head> <head>Preface </head> ... </TEI>
Next, I retag the head elements in a text editor. Specifically, I find all roman numerals, and tag them with an xml element. I know I could do this in the script, but in this case, I want to do it by hand for oversight. So, I end up with a list of head elements, tagged as I want them, looking something like this:
<TEI> <head>Note <romanNumeral>VII</romanNumeral></head> <head>Title Page </head> <head>Copyright Page </head> <head>Preface </head> ... </TEI>
Now, I want to go back through the file I extracted the head elements from, and replace the head tags there with the head tags I've extracted and tagged up in the external file. How would I do that? Thanks in advance for your help!

Replies are listed 'Best First'.
Re: Swapping xml elements from an external file?
by Fletch (Chancellor) on Oct 06, 2008 at 22:02 UTC

    Kludgy, but as the problem's presented (and presuming there's a 1-to-1 mapping from original head elements to marked up head elements):

    • Create a hash mapping from original to extra-crispymarkup head elements (presuming they're small just treat them as two text files and read them line-by-line in parallel)
    • Parse the original file and use XML::Twig to walk all the head elements, replacing the contents by looking up the original value in your hash and retrieving the possibly modified version
    • Profit!

    The cake is a lie.
    The cake is a lie.
    The cake is a lie.

Re: Swapping xml elements from an external file?
by Jenda (Abbot) on Oct 06, 2008 at 22:28 UTC

    I'll assume the big file was not modified and it still contains all those head tags in the same order! First you'll need to load all the heads from the small file:

    use strict; use XML::Rules; my $reader = XML::Rules->new( rules => { _default => 'raw', head => 'content array', TEI => 'pass no content', } ); my $heads = $reader->parsefile( $small_file_name )->{head}; #use Data::Dumper; #print Dumper($heads);
    and then filter the big XML and replace the content of the <head> tags with the ones from the small file:
    my $updater = XML::Rules->new( style => 'filter', rules => { _default => 'raw', head => sub { my ($tag, $attr) = @_; $attr->{_content} = shift(@$heads); return $tag => $attr; } } ); $updater->filterfile( $big_file_name, $updated_big_file_name);
Re: Swapping xml elements from an external file?
by GrandFather (Saint) on Oct 06, 2008 at 22:49 UTC

    There is a foible in the following code that may be an XML::Twig bug, but aside from that the following may help:

    use strict; use warnings; use XML::Twig; my $org = <<XML; <TEI> <head>Note VII</head> <head>Title Page </head> <head>Note VII</head> <head>Preface </head> <head>Copyright Page </head> </TEI> XML my $mod = <<XML; <TEI> <head>Note <romanNumeral>VII</romanNumeral></head> <head>Title Page </head> <head>Copyright Page </head> <head>Preface </head> </TEI> XML my $twig = XML::Twig->new (twig_roots => {'head' => \&fetchNode,}); my %subs; $twig->parse ($mod); $twig->purge (); $twig = XML::Twig->new ( twig_roots => {'head' => \&editNode,}, twig_print_outside_roots => 1 ); $twig->parse ($org); $twig->flush (); sub fetchNode { my ($t, $elt) = @_; my $text = $elt->text (); $subs{$text} = $elt; $t->purge (); } sub editNode { my ($t, $elt) = @_; my $text = $elt->text (); $subs{$text}->replace ($elt); $t->flush (); }

    Prints:

    <TEI> <head>Note <romanNumeral>VII</romanNumeral></head> <head>Title Page </head> <head>Note <romanNumeral>VII</romanNumeral></head> <head>Preface </head> <head>Copyright Page </head> </TEI> </TEI>

    Note the nested root noderepeated close tag! This code doesn't care if the order of the nodes has changed or if nodes have been added in the main document, but that probably doesn't matter.

    Update updating XML::Twig to 3.32 produces a slightly different result which can be corrected by removing the flush following $twig->parse ($org);.


    Perl reduces RSI - it saves typing

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://715648]
Approved by moritz
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others musing on the Monastery: (4)
As of 2022-01-20 23:46 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    In 2022, my preferred method to securely store passwords is:












    Results (57 votes). Check out past polls.

    Notices?