Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?

CSV manipulation with perl

by joec_ (Scribe)
on Mar 09, 2009 at 14:06 UTC ( [id://749290] : perlquestion . print w/replies, xml ) Need Help??

joec_ has asked for the wisdom of the Perl Monks concerning the following question:


I have a csv file, that potentially could contain 400 "columns" and anything up to 1000 rows. What i would like to end up with is this

CSV at the moment:

123,"text",66,"more text"
124,"text1",67,"more text1"
125,"text2",68,"more text2"

Seperate Strings (eventually going into oracle as clobs):
"more text"|"more text1"|"more text2"

I wont know before hand how many rows/cols the csv has as it depends on the dataset from another app.

What i have tried so far is :

while (<CSV>){ @fields = split /,/,$_; push @s, $fields[0]; $structid=join "|",@s; }
which deals with just the first column. Following this logic, i would have to create an array for each column...there must be a better way.

Any ideas appreciated



Eschew obfuscation, espouse eludication!

Replies are listed 'Best First'.
Re: CSV manipulation with perl
by Tux (Canon) on Mar 09, 2009 at 14:14 UTC

    Don't! Use Text::CSV_XS instead (or Text::CSV if you do not have a C-compiler)

    use DBI; use Text::CSV_XS; my $dbh = DBI->connect (...); my $sth = $dbh->prepare ("insert into table values (?, ?, ?, ...)"); my $csv = Text::CSV_XS->new ({ binary => 1 }); open my $dta, "<", "data.csv" or die "data: $!\n"; while (my $row = $csv->getline ($dta)) { $sth->execute (@$row); } $csv->eof or $csv->error_diag; $sth->finish; $dbh->commit;

    Enjoy, Have FUN! H.Merijn
      ... or even Text::xSV.

      A user level that continues to overstate my experience :-))
      Doesn't Text::CSV choose either Text::CSV_XS or Text::CSV_PP depending on what is available on your system? I definitely remember you saying so at YAPC::EU 2008 in Copenhagen.

      BTW, are the slides of that presentation somewhere available? I use Text::CSV quite a lot and every ounce of speed I can wring out of it helps.


      A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

        Correct, but if you only install Text::CSV, it doesn't automatically install Text::CSV_XS, so you don't get the speed, so I keep promotong the latter

        Talk is here. Opera browser might be required, as it depends on <link rel="next" type="text/html" href="..." /> in the <head> section, which Opera supports with FastForward. To skip directly to the first slide, click here.

        Enjoy, Have FUN! H.Merijn
      Does this code take 123, 124 and 125 from the csv create a string "123|124|125" and then insert into the database? And then do the same for text, text1, text2 etc etc

      Doesnt appear to... if it does, please could you explain how? At the moment, i know it might not be the best "way" of doing things but that is what i got....thanks.


      Eschew obfuscation, espouse eludication!

        No, this was just an example of how to mix CSV and DBI

        Your problem could be translated to something like

        my $csv = Text::CSV_XS->new ({ binary => 1 }); open my $dta, "<", "data.csv" or die "data: $!\n"; my @rows; while (my $row = $csv->getline ($dta)) { push @rows, $row; } $csv->eof or $csv->error_diag; close $dta; # pivot my @data = map { my $col = $_; join "|", map { $_->[$col] } @rows } 0. +.$#{$rows[0]};

        Which will convert

        123,foo,2,FOO 234,bar,2,BAR 345,zap,3,ZAP 456,bok,5,BOK


        ( "123|234|345|456", "foo|bar|zap|bok", "2|2|3|5", "FOO|BAR|ZAP|BOK")

        Assuming all records have equal length

        Enjoy, Have FUN! H.Merijn
Re: CSV manipulation with perl
by olus (Curate) on Mar 09, 2009 at 15:06 UTC

    Following your approach, I came with the following code that creates one array with all the strings that result from the concatenation of all the columns. Note that I'm not taking into account eventual memory problems that may arise from the amount of columns/lines your csv may have.

    use strict; use warnings; use Data::Dumper; my @parsed = (); while (<DATA>) { my $in = $_; chomp($in); my @line = split(/,/, $in); for my $i(0..$#line) { $parsed[$i] .= $line[$i].'|'; } } map {chop} @parsed; print Dumper(\@parsed); __DATA__ 123,"text",66,"more text" 124,"text1",67,"more text1" 125,"text2",68,"more text2"


    $VAR1 = [ '123|124|125', '"text"|"text1"|"text2"', '66|67|68', '"more text"|"more text1"|"more text2"' ];
Re: CSV manipulation with perl
by clueless newbie (Curate) on Mar 09, 2009 at 15:43 UTC
    Ignoring the problem of parsing CSV, is this what you're looking for?
    #!/perl/bin/perl use Smart::Comments; use strict; use warnings; my $_raa; my $i=0; while (<DATA>) { chomp; my @a=split(','); for my $j (0..$#a) { $_raa->[$j][$i]=$a[$j] }; $i++; }; ### $_raa __DATA__ 123,"text",66,"more text" 124,"text1",67,"more text1" 125,"text2",68,"more text2"
    which yields
    ### $_raa: [ ### [ ### '123', ### '124', ### '125' ### ], ### [ ### '"text"', ### '"text1"', ### '"text2"' ### ], ### [ ### '66', ### '67', ### '68' ### ], ### [ ### '"more text"', ### '"more text1"', ### '"more text2"' ### ] ### ]

      As a side note, I'd get rid of the $i and use what perl offers: push. (I'd also use an array rather than a reference to the array, but that's minor.)

      my @_raa; while (<DATA>) { chomp; my @a=split(','); # use Text::CSV_XS here. push @{$_raa[$_]}, $a[$_] for 0..$#a; };
      And then, if you do use Text::CSV_XS to extract the fields, use it to merge the fields back together, since the extraction will remove quotes and escapes, so you'll need to push it back with quotes and escapes (which Text::CSV_XS does for you automatically). The difference between my code and yours is that if some rows have more columns than others, we'll do different things. Yours will have some undef's in those places, while mine will silently ignore them, possibly shifting things around incorrectly. What to do in this scenario wasn't spec'd by the OP, so it's hard to tell which one is right.

Re: CSV manipulation with perl
by Bloodnok (Vicar) on Mar 09, 2009 at 14:19 UTC
    I like the sig, but suspect it (eludication) ought to be elucidation :D

    A user level that continues to overstate my experience :-))

      I thought that the misspelling was part of the joke. It was symbolizing the complications of not following that rule. Oh well :P

      And you didn't even know bears could type.

      Thanks,yes it should, but that doesnt help my immediate dilemma... :) Do you have any ideas? thanks


      Eschew obfuscation, espouse elucidation!