Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

Re: xml::libxml open, add and save not formatting properly

by dHarry (Abbot)
on Mar 24, 2010 at 11:01 UTC ( [id://830525]=note: print w/replies, xml ) Need Help??


in reply to xml::libxml open, add and save not formatting properly

As a side note, not related to your question.

I have a config xml file that I actually created with the xml lib so I know it's valid ;-)

You probably mean "well-formed", "valid" means something else in XML context:P

Out of curiosity, why do you use CDATA? If I go through your example data I see no reason to use it. The CDATA mechanism was thought up to let you quote fragments of text containing markup characters. But it doesn't really work that well. One of the biggest strengths of XML is the data validation capability (I'm thinking XMLSchema here). Putting stuff in like CDATA, by definition ignored by the parser, doesn't help in that respect.

Cheers

Harry

  • Comment on Re: xml::libxml open, add and save not formatting properly

Replies are listed 'Best First'.
Re^2: xml::libxml open, add and save not formatting properly
by ikegami (Patriarch) on Mar 24, 2010 at 18:02 UTC

    Putting stuff in like CDATA, by definition ignored by the parser, doesn't help in that respect.

    Doesn't hurt either. The following three lines are identical from the point of view of an XML parser:

    <![CDATA[http://www.example.com.com/]]> &#104;ttp://www.example.com.com/ http://www.example.com.com/
Re^2: xml::libxml open, add and save not formatting properly
by itsscott (Sexton) on Mar 24, 2010 at 17:32 UTC
    Ok, well-formed it is! I'm all self taught in all of this world so thanks for the correct term to be using in this situation.

    As for the CDATA, this tool is for crawling and analysis of our clients web sites. I'm sure we all have experienced that a large portion of sites are at best, poorly built on a technical side, and ampersands and other markup characters are often in links, titles and other elements that we collect. I suppose I could check each entry to see if it contains a markup character and CDATA only the ones I need to.

    Thanks for the input, I find that often I get confused with documentation, I was never one for being able to understand it, I am a much more hands on kind of learner, albeit, that costs it's own time and frustration, but if my mind doesn't grok it, I have to code and try it until I eventually do get it!
      Instead of
      sub text_to_xml { my $s = shift; $s =~ s/]]>/]]>]]&gt;<![CDATA[/g; return "<![CDATA[$xml]]>"; }
      you could use
      use HTML::Entities qw( encode_entities ); sub text_to_xml { return encode_entities($text, '<&'); }

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://830525]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others perusing the Monastery: (2)
As of 2024-04-25 20:09 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found