Parse with XML::Simple: how to keep some tags "unparsed"?

dda has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Parse with XML::Simple: how to keep some tags "unparsed"? by bageler (Hermit) on Jul 01, 2004 at 14:30 UTC
others have said to use CDATA but not given illustration for you: `<page id="1"> <content><![CDATA[ Ths is <div class="red">some HTML text</div>]]> </content> </page>` [download]	[reply] [d/l]
Re: Parse with XML::Simple: how to keep some tags "unparsed"? by tinita (Parson) on Jul 01, 2004 at 10:13 UTC
if i understand you correctly you don't want to parse the <content>-tags because you want to save time. then it would probably be better to use XML::Parser. also have a look at http://perl-xml.sourceforge.net/ for FAQ and examples.	[reply]
Re^2: Parse with XML::Simple: how to keep some tags "unparsed"? by dda (Friar) on Jul 01, 2004 at 10:34 UTC
No, it is not a matter of time. I need to keep all XHTML contents of <content> tags in a single place, and do not parse it into perl data structures. --dda	[reply]
Re^3: Parse with XML::Simple: how to keep some tags "unparsed"? by exussum0 (Vicar) on Jul 01, 2004 at 10:46 UTC
If you are embedding data that has >'s and <'s, then you prolly wanna use the CDATA directive/option/thing within your xml to denote, "this is data of the XML document, not part of the XML structure". If that's beyond your control, you can always create a SAX parser that does just what you want. Or you can write some XSLT that transforms the content nested data into what I described above. Bart: God, Schmod. I want my monkey-man.	[reply]
Re^4: Parse with XML::Simple: how to keep some tags "unparsed"? by gellyfish (Monsignor) on Jul 01, 2004 at 14:30 UTC
Re^5: Parse with XML::Simple: how to keep some tags "unparsed"? by exussum0 (Vicar) on Jul 01, 2004 at 14:39 UTC
Re: Parse with XML::Simple: how to keep some tags "unparsed"? by pbeckingham (Parson) on Jul 01, 2004 at 13:03 UTC
This is not valid XML. You have a tag, "`<content>`" that contains both a value "`This is`", and a child tag "`<div class="red">some HTML text</div>`". Pick one, or hide the `<>` characters with `<>`, or do the right thing and use `CDATA`. Update: I stand corrected. I just checked the XML spec (http://www.w3.org/TR/2004/REC-xml-20040204) and gellyfish and ktingle are correct. Sorry.	[reply] [d/l] [select]
Re^2: Parse with XML::Simple: how to keep some tags "unparsed"? by gellyfish (Monsignor) on Jul 01, 2004 at 13:40 UTC
It actually is valid - an node is allowed to have mixed content. This snippet will give rise to a schema like: <?xml version="1.0"?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:element name="page"> <xs:complexType> <xs:sequence> <xs:element name="content"> <xs:complexType mixed="true"> <xs:sequence> <xs:element name="div"> <xs:complexType> <xs:simpleContent> <xs:extension base="xs:string"> <xs:attribute use="required" type="xs:string" na +me="class" /> </xs:extension> </xs:simpleContent> </xs:complexType> </xs:element> </xs:sequence> </xs:complexType> </xs:element> </xs:sequence> <xs:attribute use="required" type="xs:unsignedByte" name="id" /> </xs:complexType> </xs:element> </xs:schema> [download] However this is certainly not what was intended - if the contents of `<content />` are to be taken literally it should be a CDATA section. /J\	[reply] [d/l] [select]
Re^2: Parse with XML::Simple: how to keep some tags "unparsed"? by ktingle (Sexton) on Jul 01, 2004 at 14:23 UTC
An element can have a value and a child element, check the XML spec. Its awkward, but valid XML.	[reply]
Re: Parse with XML::Simple: how to keep some tags "unparsed"? by abclex (Monk) on Jul 01, 2004 at 19:51 UTC
Did you take a look at XML::Twig? AFAIK it's very flexible in parsing/filtering tags.	[reply]
Re^2: Parse with XML::Simple: how to keep some tags "unparsed"? by qq (Hermit) on Jul 05, 2004 at 09:08 UTC
`use XML::Twig; my $xml = '<?xml version="1.0" ?> <page id="1"> <content> This is <div class="red">some HTML text</div> </content> </page>'; my $twig = XML::Twig->new( twig_handlers => { content => sub { $_->print; print "\n"; }, }, ); $twig->parse($xml); $twig->purge;` [download]	[reply] [d/l]
Re: Parse with XML::Simple: how to keep some tags "unparsed"? by Kyoichi (Novice) on Jul 05, 2004 at 00:26 UTC
Heya dda Using CDATA it's a good choice, but you may want to check XML::Smart perhaps? -- Kyoichi	[reply]


Welcome to the Monastery
	PerlMonks