Re: xml::libxml open, add and save not formatting properly (pretty printing with libxml)
by ikegami (Patriarch) on Mar 24, 2010 at 00:54 UTC
|
use strict;
use warnings;
use XML::LibXML qw( );
my $parser = XML::LibXML->new();
$parser->keep_blanks(0);
print $parser->parse_fh(*DATA)->toString(@ARGV ? $ARGV[0] : 1);
__DATA__
<?xml version="1.0" encoding="UTF-8"?>
<config>
<sites>
<site>
<sitename><![CDATA[www.example.com]]></sitename>
<active><![CDATA[1]]></active>
<rooturl><![CDATA[http://www.example.com.com/]]></rooturl>
<name><![CDATA[Example]]></name>
</site>
<site><sitename>Test entry</sitename><name></name><rooturl><![CDATA[
+http://www.test.com.com/]]></rooturl><reportname><![CDATA[tes$
</config>
<?xml version="1.0" encoding="UTF-8"?>
<config>
<sites>
<site>
<sitename><![CDATA[www.example.com]]></sitename>
<active><![CDATA[1]]></active>
<rooturl><![CDATA[http://www.example.com.com/]]></rooturl>
<name><![CDATA[Example]]></name>
</site>
<site>
<sitename>Test entry</sitename>
<name/>
<rooturl><![CDATA[http://www.test.com.com/]]></rooturl>
<reportname><![CDATA[test report name]]></reportname>
</site>
</sites>
</config>
| [reply] [d/l] [select] |
|
Bingo (bowing humbly) Thank you so much the $parser->keep_blanks(0); fixed the problem perfectly, thank you so much for your input! (dancing)
| [reply] [d/l] |
Re: xml::libxml open, add and save not formatting properly
by ikegami (Patriarch) on Mar 23, 2010 at 22:53 UTC
|
Don't you want ->toString(2)?
chmod 0664, $outfile;
that's not the right variable name. Aren't you using use strict; use warnings;?
autoflush XMLfile 1;
Useless, since closing a file handle flushes it.
binmode(XMLfile,":utf8");
That's a bug. "on document nodes [toString] returns the XML as a byte string in the original encoding of the document". You're double encoding. You want
# Switch to UTF-8 if it's not already.
$config->setEncoding('UTF-8');
open(my $config_fh, ">", $configuri) or die $!;
binmode($config_fh);
print $config_fh $config->toString(2);
close($config_fh);
chmod 0664, $configuri;
or better yet:
# Switch to UTF-8 if it's not already.
$config->setEncoding('UTF-8');
$config->toFile($configuri, 2);
chmod 0664, $configuri;
| [reply] [d/l] [select] |
|
Thanks for the quick response and information, I did make all the changes you recommended and it did not make a difference (please forgive any 'code' errors on the example, I had to extract it from our code and re-create it for the question due to a non-disclosure agreement.
As you can see, the first 'site' is nice, and the one I just added in my test is not (in fact the </sites> has also lost it's linefeed in the process.
<?xml version="1.0" encoding="UTF-8"?>
<config>
<sites>
<site>
<sitename><![CDATA[www.example.com]]></sitename>
<active><![CDATA[1]]></active>
<rooturl><![CDATA[http://www.example.com.com/]]></rooturl>
<name><![CDATA[Example]]></name>
</site>
<site><sitename>Test entry</sitename><name></name><rooturl><![CDATA[
+http://www.test.com.com/]]></rooturl><reportname><![CDATA[test report
+ name]]></reportname></site></sites>
</config>
Again, this is just a small section of many entries in this file. | [reply] [d/l] [select] |
|
The catch is that what you're asking to do involves changing the logical structure of the XML document by adding significant spaces, and XML::LibXML sees toString as a serialization function.
and it did not make a difference
I just tried it. It makes a huge difference. Not for the good, though. While it pretties up the part that isn't prettied up, it pretties up the part that's already been prettied up too.
use strict;
use warnings;
use XML::LibXML;
print XML::LibXML->new->parse_fh(*DATA)->toString(2);
__DATA__
<?xml version="1.0" encoding="UTF-8"?>
<config>
<sites>
<site>
<sitename><![CDATA[www.example.com]]></sitename>
<active><![CDATA[1]]></active>
<rooturl><![CDATA[http://www.example.com.com/]]></rooturl>
<name><![CDATA[Example]]></name>
</site>
<site><sitename>Test entry</sitename><name></name><rooturl><![CDATA[
+http://www.test.com.com/]]></rooturl><reportname><![CDATA[test report
+ name]]></reportname></site></sites>
</config>
?xml version="1.0" encoding="UTF-8"?>
<config>
<sites>
<site>
<sitename>
<![CDATA[www.example.com]]>
</sitename>
<active>
<![CDATA[1]]>
</active>
<rooturl>
<![CDATA[http://www.example.com.com/]]>
</rooturl>
<name>
<![CDATA[Example]]>
</name>
</site>
<site>
<sitename>
Test entry
</sitename>
<name/>
<rooturl>
<![CDATA[http://www.test.com.com/]]>
</rooturl>
<reportname>
<![CDATA[test report name]]>
</reportname>
</site>
</sites>
</config>
| [reply] [d/l] [select] |
|
|
Re: xml::libxml open, add and save not formatting properly
by gam3 (Curate) on Mar 23, 2010 at 23:10 UTC
|
| [reply] |
|
Looks interesting, unfortunately this projects requirements are to only use xml::libxml Thanks for the feedback!
| [reply] |
|
Too bad that you can't use something like XML::Tidy. I tried it, and it returned this:
<?xml version="1.0" encoding="utf-8"?>
<config>
<sites>
<site>
<sitename>www.example.com</sitename>
<active>1</active>
<rooturl>http://www.example.com.com/</rooturl>
<name>Example</name>
</site>
<site>
<sitename>Test entry</sitename>
<name />
<rooturl>http://www.test.com.com/</rooturl>
<reportname>test report name</reportname>
</site>
</sites>
</config>
The code that I used:#!/usr/bin/perl
use strict;
use warnings;
use XML::Tidy;
my $tidy_obj = XML::Tidy->new(
'filename' => '/path/to/xmlfile');
$tidy_obj->tidy();
$tidy_obj->write();
| [reply] [d/l] [select] |
Re: xml::libxml open, add and save not formatting properly
by dHarry (Abbot) on Mar 24, 2010 at 11:01 UTC
|
As a side note, not related to your question.
I have a config xml file that I actually created with the xml lib so I know it's valid ;-)
You probably mean "well-formed", "valid" means something else in XML context:P
Out of curiosity, why do you use CDATA? If I go through your example data I see no reason to use it. The CDATA mechanism was thought up to let you quote fragments of text containing markup characters. But it doesn't really work that well. One of the biggest strengths of XML is the data validation capability (I'm thinking XMLSchema here). Putting stuff in like CDATA, by definition ignored by the parser, doesn't help in that respect.
Cheers
Harry
| [reply] |
|
<![CDATA[http://www.example.com.com/]]>
http://www.example.com.com/
http://www.example.com.com/
| [reply] [d/l] |
|
Ok, well-formed it is! I'm all self taught in all of this world so thanks for the correct term to be using in this situation.
As for the CDATA, this tool is for crawling and analysis of our clients web sites. I'm sure we all have experienced that a large portion of sites are at best, poorly built on a technical side, and ampersands and other markup characters are often in links, titles and other elements that we collect. I suppose I could check each entry to see if it contains a markup character and CDATA only the ones I need to.
Thanks for the input, I find that often I get confused with documentation, I was never one for being able to understand it, I am a much more hands on kind of learner, albeit, that costs it's own time and frustration, but if my mind doesn't grok it, I have to code and try it until I eventually do get it!
| [reply] |
|
sub text_to_xml {
my $s = shift;
$s =~ s/]]>/]]>]]><![CDATA[/g;
return "<![CDATA[$xml]]>";
}
you could use
use HTML::Entities qw( encode_entities );
sub text_to_xml {
return encode_entities($text, '<&');
}
| [reply] [d/l] [select] |