http://qs321.pair.com?node_id=606563

valavanp has asked for the wisdom of the Perl Monks concerning the following question:

i have a xml file like below:
<doc.body> <head n="1">Introduction</head> <para>para 1</para> <para>para 2</para> <head n="2">Individual</head> <para>para 1</para> <para>para 2: <list listtype="bull"> <item>item 1</item> <item>2</item> </para> </doc.body>
The output which i want is
TX <h3>INTRODUCTION</h3> TX para 1 TX para 2 TX <h3>Individual</h3> TX para 1 TX para 2 TX <ol><li>item1</li><li>2</ol>
Can anyone suggest how to approach the above problem. i used XML::Twig module but couldn't get the solution. You can see in my post named as reading from XML files. Thanks all for your suggestions.

Replies are listed 'Best First'.
Re: xml file reading and displaying
by bsdz (Friar) on Mar 26, 2007 at 13:11 UTC
    I would go for the twig_handlers approach. Here is something I knocked together very quickly. Note you have a missing '</list>' closing element in your XML. It doesn't produce quite the same output but I am sure you can tailor it for your own purposes.
    use strict; use XML::Twig; use subs qw(say); my $twig = XML::Twig->new( twig_handlers => { head => \&head, para => \&para, } )->parsefile("foo.xml"); exit; sub head { my ($t, $e) = @_; say '<h3>'.$e->text.'</h3>'; } sub para { my ($t, $e) = @_; process_children($t, $e); } sub list { my ($t, $e) = @_; say '<ol>'; process_children($t, $e); say '</ol>'; } sub item { my ($t, $e) = @_; say '<li>'.$e->text.'</li>'; } sub process_children { my ($t, $e) = @_; map { if ($_->is_text()) { say $_->text; } else { &{\&{$_->gi}}($t, $_) if exists &{$_->gi}; } } $e->children(); } sub say { return print 'TX ', @_, "\n"; }
Re: xml file reading and displaying
by zentara (Archbishop) on Mar 26, 2007 at 14:16 UTC
Re: xml file reading and displaying
by Wonko the sane (Deacon) on Mar 27, 2007 at 11:52 UTC
    How about something like this...
    #!/usr/local/bin/perl -w use strict; use warnings; use XML::LibXML; use XML::LibXSLT; my $xml = q{<doc.body> <head n="1">Introduction</head> <para>para 1</para> <para>para 2</para> <head n="2">Individual</head> <para>para 1</para> <para>para 2: <list listtype="bull"> <item>item 1</item> <item>2</item> </list> </para> </doc.body> }; my $xslt_stylesheet = q{<?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" > <xsl:output method="xml" omit-xml-declaration="yes" /> <xsl:template match="/doc.body"> <xsl:apply-templates/> </xsl:template> <xsl:template match="head">TX <h3><xsl:apply-templates/></h3> </xsl:template> <xsl:template match="para">TX <xsl:apply-templates/> </xsl:template> <xsl:template match="list">TX <ol><xsl:apply-templates/></ol> </xsl:template> <xsl:template match="list/item"><li><xsl:apply-templates/></li> </xsl:template> </xsl:stylesheet> }; my $parser = XML::LibXML->new(); my $xslt = XML::LibXSLT->new(); my $style_doc = $parser->parse_string( $xslt_stylesheet ); my $source = $parser->parse_string( $xml ); my $stylesheet = $xslt->parse_stylesheet( $style_doc ); my $results = $stylesheet->transform( $source ); my $output = $stylesheet->output_string( $results ); print $output;
    Output:
    :!./t1.pl TX <h3>Introduction</h3> TX para 1 TX para 2 TX <h3>Individual</h3> TX para 1 TX para 2: TX <ol> <li>item 1</li> <li>2</li> </ol>
    Best Regards, Wonko
Re: xml file reading and displaying
by Jenda (Abbot) on Mar 31, 2007 at 22:24 UTC

    Yet another solution:

    use XML::Rules; my $parser = XML::Rules->new( rules => [ item => sub { '<li>' . $_[1]->{_content} . '</li>'}, list => sub { for ($_[1]->{_content}) { s/^\s+//; s/\s+$//; s{</li>\s+<li>}{</li><li>}g; } "\nTX <ol>" . $_[1]->{_content} . "</ol>\n" }, para => sub { 'TX ' . $_[1]->{_content}}, head => sub { 'TX <h3>' . $_[1]->{_content} . "</h3>"}, 'doc.body' => 'content', ] ); use Data::Dumper; my $result = $parser->parse($XML)->{'doc.body'}; for ($result) { s/^\s+//; s/\s+$//; } print $result; __END__ C:\temp>c:\temp\reformatXML.pl TX <h3>Introduction</h3> TX para 1 TX para 2 TX <h3>Individual</h3> TX para 1 TX para 2: TX <ol><li>item 1</li><li>2</li></ol>

    Getting rid of the excess whitespace complicates the code somewhat.