http://qs321.pair.com?node_id=1169224

vavz has asked for the wisdom of the Perl Monks concerning the following question:

As I am new to Perl, I need to write a Perl script which converts csv to xml.

sample csv:

Name,Company,Email,Phone A,X,a.x@aol.com,1111 B,Y,b.y@aol.com,0 C,Z,c.z@aol.com,2222
Current script:
use strict; use warnings; use XML::Writer; use IO::File; use Data::Dumper; my $CSVfile="1.csv"; my $xmlFile="1.xml"; print "Input Csv file: $CSVfile and Output XML: $xmlFile\n"; my $output = IO::File->new(">$xmlFile"); my $writer = XML::Writer->new( OUTPUT => $output, DATA_MODE => 'true', UNSAFE => 1, DATA_INDENT => 6 ); $writer->xmlDecl("UTF-8"); my $rootTag="row"; my ($arrRef,$keys)= convertCsvToHash($CSVfile); for my $hash(@$arrRef){ writeXML($writer,$hash,$rootTag); } $writer->end(); $output->close(); sub writeXML{ my ($writer,$hash, $tagname, %attrs) = @_; $writer->startTag( $tagname, %attrs ); if (ref $hash eq "HASH" ){ for my $key(@$keys){ writeXML($writer,$hash->{$key},$key); } }else{ $writer->characters($hash); } $writer->endTag($tagname); } sub convertCsvToHash{ my $file = shift; open(FILE, "$file"); my @arr = <FILE>; close(FILE); my @hashArry; chomp($arr[0]); my @keys = split(',',$arr[0]); my @ke=my ($key1, $key2) = @keys[1,3]; for my $i(1..$#arr){ chomp($arr[$i]); my @splitRec = split(',',$arr[$i]); my $hash = {}; push (my @push, ($splitRec[1],$splitRec[3])); if ($push[1] ne 0) { for my $j(0..$#ke){ $hash->{$ke[$j]} = $push[$j]; } push @hashArry,$hash; } } return \@hashArry, \@ke; }
I get after running the above:
<?xml version="1.0" encoding="UTF-8"?> <row> <Company>X</Company> <Phone>1111</Phone> </row> <row> <Company>Z</Company> <Phone>2222</Phone> </row>
Expected ouput:
<?xml version="1.0"?> <!DOCTYPE reference PUBLIC "-/EN" "reference.dtd"> <reference id="desc"> <title>Sample XML</title> <refbody> <section id="section_1"> <p> </p> <table id="table_1"> <tgroup cols="4"><colspec colnum="1" colname="col1" colwidth +="1.00*"/><colspec colnum="2" colname="col2" colwidth="1.00*"/> <thead> <row> <entry colname="col1">Company</entry> <entry colname="col2">Phone</entry> </row> </thead> <tbody> <row> <entry colname="col1">X</entry> <entry colname="col2">1111</entry> </row> </tbody> <tbody> <row> <entry colname="col1">Z</entry> <entry colname="col2">222</entry> </row> </tbody> </tgroup> </table> </section> </refbody> </reference>
Any help would be greatly appreciated, thanks

Replies are listed 'Best First'.
Re: csv to xml
by Marshall (Canon) on Aug 06, 2016 at 01:06 UTC
    I didn't know quite where to start with this. Your convertCsvToHash() code clearly does not do what you are expecting. There are some modules that can do this rather well, but below is my hint in a straightforward way. Indentation is important and is great impediment to reading your code. I have never used the XML writer code but hopefully solving the CSV part will help you out.

    Update: Note to the OP: You provided simple example data, but perhaps too simple. The kind of problems that can crop up in these CSV files are legion. "Fred Smith, MD", "George Bush, Jr" or "Widget, INC". Even parsing phone numbers can be a whole can of worms. Parsing a full blown CSV file is better done with a module because the details are a lot harder than one might think. However, you have another big problem, the XML. But now maybe you can focus on getting XML working and then come back to the CSV part when you discover that neither one of our simple approaches will work with complex data. BTW, kudos to you for providing code that actually compiles and runs albeit with the "wrong' result.

    use strict; use warnings; use XML::Writer; use IO::File; use Data::Dumper; my $CSVfile='1.csv'; my ($arrRef,@keys)= convertCsvToHash($CSVfile); print Dumper $arrRef; print Dumper \@keys; sub convertCsvToHash { my $file = shift; my @AoH; open(FILE, "$CSVfile") or die "$!";; my $header = <FILE>; chomp $header; my @keys = split ',',$header; while (my $data=<FILE>) { my %hash; chomp $data; my @values = split ',',$data; foreach my $key (@keys) { $hash{$key} = shift @values; } push @AoH, \%hash; } return \@AoH,@keys; } __END__ $VAR1 = [ { 'Phone' => '1111', 'Name' => 'A', 'Company' => 'X', 'Email' => 'a.x@aol.com' }, { 'Email' => 'b.y@aol.com', 'Company' => 'Y', 'Name' => 'B', 'Phone' => '0' }, { 'Email' => 'c.z@aol.com', 'Phone' => '2222', 'Name' => 'C', 'Company' => 'Z' } ]; $VAR1 = [ [ 'Name', 'Company', 'Email', 'Phone' ] ];
      my @values = split ',',$data;

      Unfortunately, this breaks if one of the 'values' itself contains a comma. And commas are not uncommon in company names.

      Use eg Text::CSV.

        But Text::CSV will not help very much with the type of data shown in the original post, will it?

        AFAIK, it will also break on an additional comma unless there is some additional information, such as quotes, to help identify that the additional comma is not a field separator.

Re: csv to xml
by InfiniteSilence (Curate) on Aug 07, 2016 at 02:47 UTC
    There are some pretty good responses here. I'll add that you do not seem to just be converting CSV to XML. If you were the following would suffice:
    my (@data, @hdrs) ; for(<DATA>) { chomp; unless ($first) {@hdrs = split q~,~; $first++; next}; @data = split q~,~; print qq~<record>\n~; for(1,3) {print qq!<$hdrs[$_]> $data[$_] </$hdrs[$_]>\n!}; print qq~</record>\n~; ++$first; } __DATA__ Name,Company,Email,Phone A,X,a.x@aol.com,1111 B,Y,b.y@aol.com,0 C,Z,c.z@aol.com,2222
    Produces:
    <record> <Company> X </Company> <Phone> 1111 </Phone> </record> <record> <Company> Y </Company> <Phone> 0 </Phone> </record> <record> <Company> Z </Company> <Phone> 2222 </Phone> </record>

    What you really appear to be doing is taking some data in one format and embedding it into an XML format. There is a difference between the two. There are actually tools out there to convert XML to other formats of XML as well as tools to format XML for the browser (CSS) that might make what you appear to be doing here moot.

    As for the rest of that elaborate output (really everything up to and including the tgroup tag as well as the closing tags) can be added using simple print statements like print qq~<record>\n~;. Or you could use a template module for everything but the dynamic parts of your output.

    Celebrate Intellectual Diversity