Commas in quoted CSV records

generator has asked for the wisdom of the Perl Monks concerning the following question:

I have a CSV file full of fax transaction records one per line. I need to sort these transactions by the contents of field 10. I wrote a program to import these records from a file, sort them based on contents of field 10 and export either to a SORTED.TXT file or REJECT.TXT file depending upon what was in field 10. I've used split to import each line as an array and attempted to filter for a specific value in field 10 ("32"). Because one of the fields prior to field 10 included a comma embedded within a quoted string the split pushed the contents of field 10 to field 11. The data in the first two records looks like this:

"SHAUNAGHH","","","Hawaiian Properties, Ltd.","4","911","Cost Recovery
+ Systms",40,6,32,0,1,1,"01/10/2009","13:33","ADM4968A3E0FA34",0x20000
+03,1,0,"EVERYONE",6754
"SHAUNAGHH","","","","","","Cost Recovery Systms",40,10,32,0,1,1,"01/1
+0/2009","13:33","ADM4968A3E0FA34",0x2000004,1,0,"EVERYONE",6754
[download]

There are 21 fields in each line and because field 4 in the first record has a comma in it ("Hawaiian Properties, Ltd."), the split creates an array of 22 records moving field 10 (Record 9) contents to field 11 (Record 10) and it fails to pass the filter properly. I've spend two weeks working on this, looking in the books, working on it again, going online and searching for regex's that I can understand and apply to this problem. I am extremely frustrated by my inability to get this job done. Any guidance would be appreciated.

#!c:\perl\bin\perl.exe -w
use strict;
use diagnostics;

# Open a filehandle READFILE and associate it with my data full of CSV
+ records

open (READFILE,"<s:\\RFax-L7.txt");

# Establish an array @lines and populate it with the lines in the data
+file

my @lines = <READFILE>;

# Open two more files to write to keepers in SORTED rejects in REJECT

    open SORTED,">","s:\\sorted.txt" or die "Couldn't open SORTED.TXT 
+file: $!\n";
    open REJECT,">","s:\\reject.txt" or die "Couldn't open REJECT.TXT 
+file: $!\n";

# preprocess $lines to remove embedded commas in quoted fields
# The following line worked when I declared the contents of $::lines a
+s "Aaron,\"1234 Main St, USA\",555-555-1212"
# it produced the output "Aaron,"1234 Main St USA",555-555-1212"
# but when I moved it into this program incorporating it into the loop
+ it doesn't work!

foreach $::lines (@lines)    {
    $::lines =~ s/("*),(,")/\$1\$2/g;

# Import each line into an array seperating by the commmas.
    @::field = split(/,/, $::lines);
# Test to see if the 10th field contains "32", these we keep
        if ($::field[9] != 32)    {
        print REJECT "* rejected not 32 * $::lines";
        print "* REJECTED * $::lines";
                    } else    {
        print SORTED "$::lines";
        print "\$ KEEPER ==> $::lines";
                        }
                }
close READFILE;
    close SORTED;
        close REJECT;
[download]

I managed to find a code snippet online which seemed to work when using a static $text value defined within the program.

# remove commas from quoted text strings;

$text = "Aaron,\"1223 Main St, USA\",555-555-1212";

print " INBOUND \$text is equal to: $text\n";

$text =~ s/("[\w\ ]*),([\w\ ]+")/\1\2/g;

print " OUTBOUND \$text is equal to: $text";

# This produced output as follows:
# INBOUND $text is equal to: Aaron,"1234 Main St, USA",555-555-1212
# OUTBOUND $text is equal to: Aaron,"1234 Main St USA",555-555-1212
[download]

I tried to move the s/// line into my earlier program as a pre-processing step to remove any commas from within "quoted" string values of $text. But in the first program it fails. I have the Text-CSV_XS package but can't figure out how to use it. I would greatly appreciate getting some direction from more experienced PERL programers. Thanks for taking a look.

Comment on Commas in quoted CSV records Select or Download Code

Replies are listed 'Best First'.
Re: Commas in quoted CSV records by ikegami (Patriarch) on Mar 25, 2009 at 03:59 UTC
Text::CSV_XS `while (<$fh_in>) { $csv->parse($_) or die("csv: " . $csv->error_diag() . "\n"); my @fields = $csv->fields(); if ($fields[9] == 32) { print $fh_kept $_; } else { print $fh_rejected $_; } }` [download] Update: The OP talks about sorting the records, but he seems to just wants to filter them? Updated code.	[reply] [d/l]
Re^2: Commas in quoted CSV records by Tux (Canon) on Mar 25, 2009 at 07:43 UTC
Please don't promote `parse ()` when reading from streams. This is error prone use strict; use warnings; use Text::CSV_XS; open my $h_in, "<", "S:/RFax-L7.txt" or die "RFAX: $!"; open my $h_sr, ">", "S:/sorted.csv" or die "Sorted: $!"; open my $h_rj, ">", "S:/rejected.csv" or die "Rejected: $!"; my $csv_in = Text::CSV_XS->new ({ binary => 1 }); my $csv_out = Text::CSV_XS->new ({ binary => 1, eol => "\r\n" }); while (my $row = $csv_in->getline ($h_in)) { if ($row->[9] == 32) { $csv_out->print ($h_sr, $row) } else { $csv_out->print ($h_rj, $row) } } $csv_in->eof or $csv_in->error_diag (); close $_ for $h_in, $h_sr, $h_rj; [download] The prints that ikegami used will work fine too, but with the above code, you are more flexible, as you can alter the fields before writing them and still be sure the output is still valid. The OP sais Text::CSV_XS is already installed. Upgrading might give more functionality, but for a simple task like this you might not need it. To upgrade Text::CSV_XS you will need the matching installer. You seem to be on windows, reading your example, which means either ActivePerl or Strawberry, which will most likely also be somewhere in your START menu Strawberry `C:> cpan Text::CSV_XS` ActivePerl `C:> ppm update Text::CSV_XS` Enjoy, Have FUN! H.Merijn	[reply] [d/l] [select]
Re^2: Commas in quoted CSV records by generator (Pilgrim) on Mar 25, 2009 at 04:48 UTC
Thanks for the reply. I'd love to use the package processes but I can't figure out how to apply it. I can't find any documentation that is explicit enough for an inexperienced PERL programmer like me to understand. I tried to insert the process you suggested but am getting errors related to "...requires explicit package name" on any number of lines. I'd love to find some brief discussion of the suggested Package function that would spell out the syntax and process. Any thoughts? Thanks again for chiming in. I feel like there is hope!	[reply]
Re^3: Commas in quoted CSV records by Corion (Patriarch) on Mar 25, 2009 at 08:02 UTC
See A Guide to Installing Modules and A guide to installing modules for Win32, and Yes, even you can use CPAN. As you're under Windows, the most likely process available to you is to use the `ppm` utility that comes with ActiveState Perl to install Text::CSV_XS. If you have Strawberry Perl, `ppm` is not available but the "normal" installation process through `cpan Text::CSV_XS` should Just Work.	[reply] [d/l] [select]
Re^4: Commas in quoted CSV records by Tux (Canon) on Mar 25, 2009 at 08:42 UTC
Re^5: Commas in quoted CSV records by CountZero (Bishop) on Mar 25, 2009 at 13:19 UTC
Some notes below your chosen depth have not been shown here
Re^3: Commas in quoted CSV records by ikegami (Patriarch) on Mar 25, 2009 at 05:13 UTC
I used $fh_in, $fh_kept and $fh_rejected for my file handles. You used something else. Adjust one or the other.	[reply]
Re^3: Commas in quoted CSV records by DrHyde (Prior) on Mar 25, 2009 at 11:20 UTC
If you don't know perl well enough, then go and buy a copy of "Learning Perl", work through it, then come back to your problem. You will find that far more effective than just asking here, because people will not just write all your code for you.	[reply]
Re^4: Commas in quoted CSV records by generator (Pilgrim) on Mar 25, 2009 at 20:03 UTC
Re^2: Commas in quoted CSV records by generator (Pilgrim) on Mar 25, 2009 at 19:50 UTC
Thank you for your reply. This works wonderfully. I've managed to find the documentation on Text::CSV_XS and am currently disecting your snippet so that I understand what it is doing. Thanks again for your assistance.	[reply]
Re: Commas in quoted CSV records by svenXY (Deacon) on Mar 25, 2009 at 11:54 UTC
Hi, TIMTOWTDI DBD::CSV provides SQL access to a csv file (read and write) and will take care of the commata. Regards, svenXY	[reply]
Re^2: Commas in quoted CSV records by generator (Pilgrim) on Mar 25, 2009 at 22:10 UTC
Thanks for the comment. I'm looking forward to working with the various DB (modules\|packages) as I'm more comfortable with SQL than I am with PERL. I'm a system integrator and started working with PERL because the export/import tools provided by the vendors are primitive. I'm hoping to start directly pulling data from the databases using the DB modules and ODBC connections but for right now that layer of complexity is a little out of my reach. Thanks again for your suggestion. --------- generator	[reply]
Re: Commas in quoted CSV records by EvanCarroll (Chaplain) on Mar 25, 2009 at 04:07 UTC
Yes, use Text::CSV. I'm not exactly sure what you're trying to do but here is how you would sort them: use strict; use warnings; use XXX; use Text::CSV; my $csv = Text::CSV->new({ binary=>1, blank_is_undef => 1 }); my @rows; foreach my $line ( <DATA> ) { $csv->parse($line); push @rows, [$csv->fields]; } YYY [sort { $a->[9] <=> $b->[9] } @rows]; __DATA__ "SHAUNAGHH","","","Hawaiian Properties, Ltd.","4","911","Cost Recovery + Systms",40,6,32,0,1,1,"01/10/2009","13:33","ADM4968A3E0FA34",0x20000 +03,1,0,"EVERYONE",6754 "SHAUNAGHH","","","","","","Cost Recovery Systms",40,10,32,0,1,1,"01/1 +0/2009","13:33","ADM4968A3E0FA34",0x2000004,1,0,"EVERYONE",6754 [download] Evan Carroll I hack for the ladies. www.EvanCarroll.com	[reply] [d/l]


Think about Loose Coupling
	PerlMonks