Perhaps it's because now that I have a hammer, everything looks like a nail, but that sounds like an ideal use for XSLT. I've had a great deal of luck using XML::LibXML and XML::LibXSLT in conjunction to do things very like what you're describing.
Your script would be something along the lines of:
#!/usr/bin/perl -wT
use strict;
use XML::LibXSLT;
use XML::LibXML;
my $page2 = "<root>
<title>First title</title>
<othertag>Other stuff</othertag>
<title>Second title</title>
<othertag>More other stuff</othertag>
</root>";
my $parser = XML::LibXML->new();
my $xslt = XML::LibXSLT->new();
# assumes you've got the XML doc as a string in $page2
my $source = $parser->parse_string($page2);
# assumes your XSL file is convert.xsl
my $style_doc = $parser->parse_file('convert.xsl');
my $stylesheet = $xslt->parse_stylesheet($style_doc);
my $results = $stylesheet->transform($source);
print $stylesheet->output_string($results);
* Code above adapted from the XML::LibXSLT docs
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" versi
+on="1.0">
<xsl:template match="/">
<root>
<xsl:apply-templates select="//root/title"/>
</root>
</xsl:template>
<xsl:template match="title">
<li>
<xsl:text>A</xsl:text>
<xsl:value-of select="."/>
<xsl:text>C</xsl:text>
</li>
</xsl:template>
</xsl:stylesheet>
The above script and XSL file together yield the following output:
<?xml version="1.0"?>
<root><li>AFirst titleC</li><li>ASecond titleC</li></root>
This may well be too heavy handed if your project is relatively small, but if there are more chunks you are trying to capture in similar ways, you might consider such an approach.
__________
He seemed like such a nice guy to his neighbors /
Kept to himself and never bothered them with favors
- Jefferson Airplane, "Assassin"
Update: Whoops, in other words, what trs80 said above ;)
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.