Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

(jeffa) Re: XML Search and Replace

by jeffa (Bishop)
on Jun 11, 2002 at 16:00 UTC ( [id://173532]=note: print w/replies, xml ) Need Help??


in reply to XML Search and Replace

I will let mirod handle the XML::Twig version, as i have not progressed to that module yet. I recently bought Perl & XML and am enjoying it immensely. Here is an 'event stream' version that uses XML::Parser and XML::Writer to replace all <foo> elements with the element <struggle>. Input is from the DATA file handle and output is STDOUT:
use strict; use XML::Parser; use XML::Writer; my $writer = XML::Writer->new(); my $parser = XML::Parser->new( Handlers => { Init => \&handle_Init, Start => \&handle_Start, Char => \&handle_Char, End => \&handle_End, Final => \&handle_Final, } ); # i could have also made these $parser attributes # such as $parser->{from} and $parser->{to} our $from = 'foo'; our $to = 'struggle'; my $data = do {local $/;<DATA>}; $parser->parse($data); # called once at the beginning sub handle_Init { $writer->xmlDecl('UTF-8'); $writer->doctype('xml'); } # called each time a start element is encountered sub handle_Start { my($self,$name,%atts) = @_; $name = $to if $name eq $from; $writer->startTag($name,%atts); } # called each time non-markup data is encountered sub handle_Char { my($self,$text) = @_; $writer->characters($text); } # called each time an end element is encountered sub handle_End { my($self,$name) = @_; $name = $to if $name eq $from; $writer->endTag($name); } # called once at the end of the document sub handle_Final { $writer->end(); } __DATA__ <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE xml> <xml> <foo class="life or death"> <opponent>wolf</opponent> <opponent>ant</opponent> </foo> <foo class="life or death"> <opponent>pantomime goose</opponent> <opponent>Terrance Rattigan</opponent> </foo> </xml>

jeffa

L-LL-L--L-LL-L--L-LL-L--
-R--R-RR-R--R-RR-R--R-RR
B--B--B--B--B--B--B--B--
H---H---H---H---H---H---
(the triplet paradiddle with high-hat)

Replies are listed 'Best First'.
Re: (jeffa) Re: XML Search and Replace
by coreolyn (Parson) on Jun 11, 2002 at 16:34 UTC

    This code helps a lot for understanding XML::Parser, but it also shows how I've failed to communicate what I'm attempting to do.

    To illustrate via your example: I'm looking to search for the 'opponent' element and change the value of it's text.

    coreolyn .. me thinks I'll be buying Perl & XML shortly.
      Here is another way, using XML::Simple, which changes a few of the opponents, like you wanted.
      use strict; use XML::Simple; my @data = (<DATA>); my $xml = XMLin((join'', @data)); foreach my $foo (@{$xml->{'foo'}}) { foreach my $opponent (@{$foo->{'opponent'}}) { if($opponent eq 'wolf') { $opponent = 'Heinz Sielmann'; } elsif($opponent eq 'ant') { $opponent = 'Peter Scott'; } } } print XMLout($xml, rootname => 'xml'); __DATA__ <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE xml> <xml> <foo class="life or death"> <opponent>wolf</opponent> <opponent>ant</opponent> </foo> <foo class="life or death"> <opponent>pantomime goose</opponent> <opponent>Terrance Rattigan</opponent> </foo> </xml>
      I don't usually code a lot of XML, but when I need to, I find that XML::Simple (together with Data::Dumper) often lets me do simple stuff really quickly. Of course, it takes a little tounge-in-cheek for the dereferencing sometimes, see references quick reference for an excellent tutorial on this. :)
      You have moved into a dark place.
      It is pitch black. You are likely to be eaten by a wolf.
        Note that reading and writing modifed XML document using XML::Simple really works only in simple cases. The problem with XML::Simple is that XMLout(XMLin($xml)) is not guarantied to produce XML document with same structure. From perldoc XML::Simple:
        o The API offers little control over the output of "XMLout()". In particular, it is not especially likely that feeding the output from "XMLin()" into "XMLout()" will reproduce the original XML (although passing the output from "XMLout()" into "XMLin()" should reproduce the original data structure).

        --
        Ilya Martynov (http://martynov.org/)

      Had to run to lunch ... here is another version that DWYW ;)
      use strict; use XML::Parser; use XML::Writer; my $writer = XML::Writer->new(); my $parser = XML::Parser->new( Handlers => { Init => \&handle_Init, Start => \&handle_Start, Char => \&handle_Char, End => \&handle_End, Final => \&handle_Final, } ); my $data = do {local $/;<DATA>}; $parser->{match} = 'opponent'; $parser->parse($data); sub handle_Init { $writer->xmlDecl('UTF-8'); $writer->doctype('xml'); } sub handle_Start { my($self,$name,%atts) = @_; $self->{flag} = 1 if $name eq $self->{match}; $writer->startTag($name,%atts); } sub handle_Char { my($self,$text) = @_; if ($self->{flag}) { if ($text eq 'Terrance Rattigan') { $text = 'breakfast'; } else { $text =~ s/goose/Queen Elizabeth/; } delete $self->{flag}; } $writer->characters($text); } sub handle_End { my($self,$name) = @_; $writer->endTag($name); } sub handle_Final { $writer->end(); } __DATA__ <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE xml> <xml> <struggle class="life or death"> <opponent>wolf</opponent> <opponent>ant</opponent> </struggle> <struggle class="life or death"> <opponent>pantomime goose</opponent> <opponent>Terrance Rattigan</opponent> </struggle> </xml>
      If a start element named 'opponent' is found, we set a flag - why not use the parser's namespace? ;) Next, each time a non-markup character is encountered, we see if the flag is set and if it is, do some conversions and erase the flag.

      Most of my XML munging (until recently) has been with XML::Simple. That module builds an internal tree that represents the document. As Dog and Pony showed you, it is a really easy module to work with, but as the document you are munging gets larger, XML::Simple gets slower and takes up more memory.

      These two versions i supplied use XML::Parser to take advantage of 'event streams', they are more economical in speed and memory. But they are also more complicated, as you can immediately tell by comparing my code with Dog and Pony's.

      jeffa

      "Here we see a life and death stuggle between jeffa and Dog and Pony ..." ;)

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://173532]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others taking refuge in the Monastery: (4)
As of 2024-04-19 02:19 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found