Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

Commas in quoted CSV records

by generator (Pilgrim)
on Mar 25, 2009 at 03:23 UTC ( [id://752999]=perlquestion: print w/replies, xml ) Need Help??

generator has asked for the wisdom of the Perl Monks concerning the following question:

I have a CSV file full of fax transaction records one per line. I need to sort these transactions by the contents of field 10. I wrote a program to import these records from a file, sort them based on contents of field 10 and export either to a SORTED.TXT file or REJECT.TXT file depending upon what was in field 10. I've used split to import each line as an array and attempted to filter for a specific value in field 10 ("32"). Because one of the fields prior to field 10 included a comma embedded within a quoted string the split pushed the contents of field 10 to field 11. The data in the first two records looks like this:
"SHAUNAGHH","","","Hawaiian Properties, Ltd.","4","911","Cost Recovery + Systms",40,6,32,0,1,1,"01/10/2009","13:33","ADM4968A3E0FA34",0x20000 +03,1,0,"EVERYONE",6754 "SHAUNAGHH","","","","","","Cost Recovery Systms",40,10,32,0,1,1,"01/1 +0/2009","13:33","ADM4968A3E0FA34",0x2000004,1,0,"EVERYONE",6754
There are 21 fields in each line and because field 4 in the first record has a comma in it ("Hawaiian Properties, Ltd."), the split creates an array of 22 records moving field 10 (Record 9) contents to field 11 (Record 10) and it fails to pass the filter properly. I've spend two weeks working on this, looking in the books, working on it again, going online and searching for regex's that I can understand and apply to this problem. I am extremely frustrated by my inability to get this job done. Any guidance would be appreciated.
#!c:\perl\bin\perl.exe -w use strict; use diagnostics; # Open a filehandle READFILE and associate it with my data full of CSV + records open (READFILE,"<s:\\RFax-L7.txt"); # Establish an array @lines and populate it with the lines in the data +file my @lines = <READFILE>; # Open two more files to write to keepers in SORTED rejects in REJECT open SORTED,">","s:\\sorted.txt" or die "Couldn't open SORTED.TXT +file: $!\n"; open REJECT,">","s:\\reject.txt" or die "Couldn't open REJECT.TXT +file: $!\n"; # preprocess $lines to remove embedded commas in quoted fields # The following line worked when I declared the contents of $::lines a +s "Aaron,\"1234 Main St, USA\",555-555-1212" # it produced the output "Aaron,"1234 Main St USA",555-555-1212" # but when I moved it into this program incorporating it into the loop + it doesn't work! foreach $::lines (@lines) { $::lines =~ s/("*),(,")/\$1\$2/g; # Import each line into an array seperating by the commmas. @::field = split(/,/, $::lines); # Test to see if the 10th field contains "32", these we keep if ($::field[9] != 32) { print REJECT "* rejected not 32 * $::lines"; print "* REJECTED * $::lines"; } else { print SORTED "$::lines"; print "\$ KEEPER ==> $::lines"; } } close READFILE; close SORTED; close REJECT;
I managed to find a code snippet online which seemed to work when using a static $text value defined within the program.
# remove commas from quoted text strings; $text = "Aaron,\"1223 Main St, USA\",555-555-1212"; print " INBOUND \$text is equal to: $text\n"; $text =~ s/("[\w\ ]*),([\w\ ]+")/\1\2/g; print " OUTBOUND \$text is equal to: $text"; # This produced output as follows: # INBOUND $text is equal to: Aaron,"1234 Main St, USA",555-555-1212 # OUTBOUND $text is equal to: Aaron,"1234 Main St USA",555-555-1212
I tried to move the s/// line into my earlier program as a pre-processing step to remove any commas from within "quoted" string values of $text. But in the first program it fails. I have the Text-CSV_XS package but can't figure out how to use it. I would greatly appreciate getting some direction from more experienced PERL programers. Thanks for taking a look.

Replies are listed 'Best First'.
Re: Commas in quoted CSV records
by ikegami (Patriarch) on Mar 25, 2009 at 03:59 UTC
    Text::CSV_XS
    while (<$fh_in>) { $csv->parse($_) or die("csv: " . $csv->error_diag() . "\n"); my @fields = $csv->fields(); if ($fields[9] == 32) { print $fh_kept $_; } else { print $fh_rejected $_; } }

    Update: The OP talks about sorting the records, but he seems to just wants to filter them? Updated code.

      Please don't promote parse () when reading from streams. This is error prone

      use strict; use warnings; use Text::CSV_XS; open my $h_in, "<", "S:/RFax-L7.txt" or die "RFAX: $!"; open my $h_sr, ">", "S:/sorted.csv" or die "Sorted: $!"; open my $h_rj, ">", "S:/rejected.csv" or die "Rejected: $!"; my $csv_in = Text::CSV_XS->new ({ binary => 1 }); my $csv_out = Text::CSV_XS->new ({ binary => 1, eol => "\r\n" }); while (my $row = $csv_in->getline ($h_in)) { if ($row->[9] == 32) { $csv_out->print ($h_sr, $row) } else { $csv_out->print ($h_rj, $row) } } $csv_in->eof or $csv_in->error_diag (); close $_ for $h_in, $h_sr, $h_rj;

      The prints that ikegami used will work fine too, but with the above code, you are more flexible, as you can alter the fields before writing them and still be sure the output is still valid.

      The OP sais Text::CSV_XS is already installed. Upgrading might give more functionality, but for a simple task like this you might not need it. To upgrade Text::CSV_XS you will need the matching installer. You seem to be on windows, reading your example, which means either ActivePerl or Strawberry, which will most likely also be somewhere in your START menu

      Strawberry

      C:> cpan Text::CSV_XS

      ActivePerl

      C:> ppm update Text::CSV_XS

      Enjoy, Have FUN! H.Merijn
      Thanks for the reply. I'd love to use the package processes but I can't figure out how to apply it. I can't find any documentation that is explicit enough for an inexperienced PERL programmer like me to understand. I tried to insert the process you suggested but am getting errors related to "...requires explicit package name" on any number of lines. I'd love to find some brief discussion of the suggested Package function that would spell out the syntax and process. Any thoughts? Thanks again for chiming in. I feel like there is hope!
        I used $fh_in, $fh_kept and $fh_rejected for my file handles. You used something else. Adjust one or the other.
        If you don't know perl well enough, then go and buy a copy of "Learning Perl", work through it, then come back to your problem. You will find that far more effective than just asking here, because people will not just write all your code for you.
      Thank you for your reply. This works wonderfully. I've managed to find the documentation on Text::CSV_XS and am currently disecting your snippet so that I understand what it is doing. Thanks again for your assistance.
Re: Commas in quoted CSV records
by svenXY (Deacon) on Mar 25, 2009 at 11:54 UTC
    Hi,
    TIMTOWTDI
    DBD::CSV provides SQL access to a csv file (read and write) and will take care of the commata.
    Regards,
    svenXY
      Thanks for the comment.
      I'm looking forward to working with the various DB (modules|packages) as I'm more comfortable with SQL than I am with PERL.
      I'm a system integrator and started working with PERL because the export/import tools provided by the vendors are primitive.
      I'm hoping to start directly pulling data from the databases using the DB modules and ODBC connections but for right now that layer of complexity is a little out of my reach.
      Thanks again for your suggestion.
      ---------
      generator
Re: Commas in quoted CSV records
by EvanCarroll (Chaplain) on Mar 25, 2009 at 04:07 UTC
    Yes, use Text::CSV. I'm not exactly sure what you're trying to do but here is how you would sort them:
    use strict; use warnings; use XXX; use Text::CSV; my $csv = Text::CSV->new({ binary=>1, blank_is_undef => 1 }); my @rows; foreach my $line ( <DATA> ) { $csv->parse($line); push @rows, [$csv->fields]; } YYY [sort { $a->[9] <=> $b->[9] } @rows]; __DATA__ "SHAUNAGHH","","","Hawaiian Properties, Ltd.","4","911","Cost Recovery + Systms",40,6,32,0,1,1,"01/10/2009","13:33","ADM4968A3E0FA34",0x20000 +03,1,0,"EVERYONE",6754 "SHAUNAGHH","","","","","","Cost Recovery Systms",40,10,32,0,1,1,"01/1 +0/2009","13:33","ADM4968A3E0FA34",0x2000004,1,0,"EVERYONE",6754


    Evan Carroll
    I hack for the ladies.
    www.EvanCarroll.com

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://752999]
Approved by planetscape
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others learning in the Monastery: (4)
As of 2024-04-19 04:31 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found