Dear All,
I am parsing a large number of data files and need to save the output in an xml file. A minimal running example is below – contrived and simplified of course, since it is about gigabytes. My problem is that if I assemble the complete xml output document in memory it grows too large so that $doc->setDocumentElement($root); print $doc->toString() would not be good. So I try to print the output in chunks as I would normally do with csv output. The code below does print chunks (single composed nodes – in a real life it would be one composed node per data file). What I did not figure out is how to print out the root element nicely. Currently I just hardcoded the opening tag output at the beginning and the closing tag at the end. Is there a nicer way to do this (or probably a nicer way to create xml in the first place)?
Thanks in advance!
#!/perl
use strict;
use warnings FATAL => qw(all);
use Text::CSV_XS;
use XML::LibXML;
my $csv_par = { binary => 1, auto_diag => 1,
allow_whitespace => 1, sep_char => ';',
eol => $/, quote_char => undef, };
my $csv = Text::CSV_XS->new($csv_par);
my @header = @{$csv->getline(*DATA)};
my %rec;
$csv->bind_columns(\@rec{@header});
my $doc = XML::LibXML::Document->new('1.0', 'utf-8');
my $root = $doc->createElement("ROOT");
print join("\n",
'<?xml version="1.0" encoding="UTF-8"?>',
'<ROOT>'),$/;
while ( $csv->getline(*DATA) )
{
my $line_tag = $doc->createElement("alpha");
$line_tag->setAttribute('name'=> $rec{"alpha"});
# $root->appendChild($line_tag); # intentional.
for my $other ( qw(beta gamma) )
{
my $other_tag = $doc->createElement($other);
$other_tag->setAttribute(name => $rec{$other});
$line_tag->appendChild($other_tag);
}
print $line_tag->toString(1),$/;
}
print '</ROOT>',$/;
=output
<?xml version="1.0" encoding="UTF-8"?>
<ROOT>
<alpha name="q">
<beta name="2"/>
<gamma name="3"/>
</alpha>
<alpha name="w">
<beta name="9"/>
<gamma name="8"/>
</alpha>
<alpha name="e">
<beta name="1"/>
<gamma name="2"/>
</alpha>
<alpha name="r">
<beta name="6"/>
<gamma name="7"/>
</alpha>
<alpha name="t">
<beta name="5"/>
<gamma name="9"/>
</alpha>
<alpha name="y">
<beta name="3"/>
<gamma name="1"/>
</alpha>
</ROOT>
=cut
__DATA__
alpha;beta;gamma
q;2;3
w;9;8
e;1;2
r;6;7
t;5;9
y;3;1