Text::CSV

Item Description: Manipulation routines for comma-separated values

Review Synopsis:

Author: Alan Citterman

I had a project where I needed to extract data from a file and send
it to a customer. The file in question was from a database, and it
had been exported to a CSV text file.

I would have tried to write my own regular expression to handle this,
but my overall knowledge of Perl isn't that good. However, after some
research, I found a reference to this module.

#!/usr/bin/perl
use strict;
use Text::CSV;
[download]

I knew that the text file had lines of data that I didn't need, and
that there was an easily recognizable pattern in those lines, so I could
use a regular expression to put those lines into a trash file.

my $input="input.csv";
my $output="output.txt";
my $trash="trashfile";
my $csv=Text::CSV->new();     #Creates a new Text::CSV object

open(INFILE,$input) || die "Can't open file $input";
open(OUTFILE,">$output") || die "Can't open file $output";
open(TRASH,">$trash") || die "Can't open file $trash";
[download]

Now to start reading the data from the file, store it in the $_ variable
and print it to the trash file if its not good, or parse the variable, and
print it to the output file if it is.

while (<INFILE>) {
if (/"X"/) {   #The trash data has these 3 characters in it
  print TRASH "$_\n";
}
else {  #Now to deal with the data I want to keep
  if($csv->parse($_)) {       #checks to see if data exists in $_ and 
+parses it if it does
    my @fields=$csv->fields;  # puts the values from each field in an 
+array
    my $elements=@fields;     #gets the number of elements in the arra
+y
    for ($x=0;$x<$elements;$x++) {
      print OUTFILE "$fields[$x]\t";
    }
  }
}
}
[download]

Now that the files have been written to, I can close them up, and remove
the trash file

close INFILE;
close OUTFILE;
close TRASH;

unlink $trash;
[download]

All in all, a very useful module.

Comment on Text::CSV Select or Download Code

Replies are listed 'Best First'.
Re: Text::CSV by swiftone (Curate) on Jan 11, 2001 at 02:34 UTC
I to use Text::CSV quite happily. I spent forever and a day (actually, about 6 hours) trying to debug a problem with it, that was finally answered deep within the documentation. When it says: Allowable characters within a CSV field include 0x09 (tab) and the inclusive range of 0x20 (space) through 0x7E (tilde). It means it. In particular, if you get anything outside of this range, including any MS curly-quotes or (in my case) a single oddly out-of-range byte, it will quietly fail. Note that it does not accept newlines inside any field either.	[reply]
Re: Re: Text::CSV by dilbert (Novice) on Oct 03, 2002 at 14:27 UTC
If you use Text::CSV_XS, you can handle those characters if you turn on the 'binary' option, like this: `use Text::CSV_XS; $csv = Text::CSV->new({binary => 1}); ... $csv->parse($_); @fields = $csv->fields();` [download] Really nifty if you have to parse a CSV file with French text with accented characters and newlines in it...	[reply] [d/l]
Re: Re: Re: Text::CSV by AssFace (Pilgrim) on May 20, 2004 at 23:21 UTC
Just to be nit-picky, I think that is: `$csv = Text::CSV_XS->new({binary => 1});` [download] Those that pointed out that this would resolve working with text that has accents in it - I love you. I was going absolutely batty trying to deal with this text that I needed to convert to XML. ------------------------------------------------------------------- There are some odd things afoot now, in the Villa Straylight.	[reply] [d/l]
Re: Text::CSV by dkubb (Deacon) on Jan 18, 2001 at 06:33 UTC
This is an excellent module, but I prefer to use Text::CSV_XS. It's an XS based module, and is quite fast, definately the fastest perl module for manipulating quote-comma files on CPAN, that I have heard of. It has an identical interface to Text::CSV, meaning all would have to do a `s/Text::CSV/Text::CSV_XS/g` on the source file, and it should work the same.	[reply] [d/l]
Re: Re: Text::CSV by Anonymous Monk on Apr 03, 2001 at 03:32 UTC
I had problems with both those modules when I tried them. The rules for escaping quotes and spaces within a CSV record are very clear, but the modules don't handle all the cases correctly. Every data file would die on some record.. that was legal, but which the modules wouldn't handle correctly. I ended up writing my own state-machine parser for CSV, first splitting the record into characters etc, just as I would do it in C.. it took me a couple of hours to code and test, but as I had already wasted many days messing with broken modules I considered it time well spent. -Ben M	[reply]
Re (tilly) 1: Text::CSV by tilly (Archbishop) on Apr 03, 2001 at 07:53 UTC
I should get a CPAN ID, etc and put this there. But in the meantime Text::xSV is a plausible solution to this problem in pure Perl.	[reply]
Re: Text::CSV by TStanley (Canon) on Jan 13, 2001 at 03:25 UTC
UPDATE: One problem that occured to me later on in this script, is that I didn't check to see if the $Outfile and $Junkfile existed. When I wrote this script, I had already touched those filenames, so I knew they existed, and didn't have to check for them. So I now add the following code: `if(!-e $Outfile){ system("touch $Outfile"); } system("touch $Junkfile");` [download] TStanley In the end, there can be only one!	[reply] [d/l]
Re: Text::CSV by wilmer_t (Novice) on Aug 20, 2014 at 13:33 UTC
It's true that Text::CSV does not directly handle multiline fields, but I found it quite easy to get around My task was to export/import a set of requirements between tools, for which purpose I (amongst others) needed to add a field and sort the input recursively by parent reference (using `Sort::Naturally`) On reading, I created multilines as `while ( my $line = $csv->getline($fh) ) { # Manipulation of unsorted fields during import my @multiline = join(';',@{$line}); push @content, @multiline ; } if (not $csv->eof) { $csv->error_diag(); }` [download] and on printing back, I entered data field by field as `for my $line (@result) { my @fields = split(/;/,$line); # Manipulation of sorted fields during export print $fh "$_;" foreach @fields; print $fh "\n"; }` [download] with csv object created as `$csv = Text::CSV->new({ binary => 1, auto_diag => 1, sep_char => ';' });` I.e. I never bothered to try using the module for writing back data, nor other suggestions as from above :)	[reply] [d/l] [select]
Re^2: Text::CSV by Anonymous Monk on Aug 20, 2014 at 14:32 UTC
It's true that Text::CSV does not directly handle multiline fields What is your definition of "multiline fields"? Because Text::CSV does handle fields with embedded newlines. In your output code, you could have used Text::CSV's `print` method. (By the way, why split `$line` only to join it again immediately afterwards?) Note that the last post in this thread was ~10 years ago, a lot has changed since then.	[reply] [d/l] [select]
Re^3: Text::CSV by wilmer_t (Novice) on Aug 25, 2014 at 20:23 UTC
Thanks for feedback, Mr Anon :) Multiline, meaning embedded newlines, yes - and you are right that the post is old, but since I am closing down a project, browsing through some perl hacks I found this thread and thought it worth while to comment on just the same. You might be right though that I was not installing the latest Text::CSV when making this script As for the method at parsing and printing, well I had to do some field manipulation both during import (unsorted) and export (sorted) for which I thought my example could be of use to someone else, struggling at CSV file manipulations :) Updated the examples accordingly, to better illustrate this	[reply]


Perl Monk, Perl Meditation
	PerlMonks