I have a CSV file full of fax transaction records one per line. I need to sort these transactions by the contents of field 10.
I wrote a program to import these records from a file, sort them based on contents of field 10 and export either to a SORTED.TXT file or REJECT.TXT file depending upon what was in field 10.
I've used split to import each line as an array and attempted to filter for a specific value in field 10 ("32"). Because one of the fields prior to field 10 included a comma embedded within a quoted string the split pushed the contents of field 10 to field 11.
The data in the first two records looks like this:
"SHAUNAGHH","","","Hawaiian Properties, Ltd.","4","911","Cost Recovery
+ Systms",40,6,32,0,1,1,"01/10/2009","13:33","ADM4968A3E0FA34",0x20000
+03,1,0,"EVERYONE",6754
"SHAUNAGHH","","","","","","Cost Recovery Systms",40,10,32,0,1,1,"01/1
+0/2009","13:33","ADM4968A3E0FA34",0x2000004,1,0,"EVERYONE",6754
There are 21 fields in each line and because field 4 in the first record has a comma in it ("Hawaiian Properties, Ltd."), the split creates an array of 22 records moving field 10 (Record 9) contents to field 11 (Record 10) and it fails to pass the filter properly.
I've spend two weeks working on this, looking in the books, working on it again, going online and searching for regex's that I can understand and apply to this problem. I am extremely frustrated by my inability to get this job done. Any guidance would be appreciated.
#!c:\perl\bin\perl.exe -w
use strict;
use diagnostics;
# Open a filehandle READFILE and associate it with my data full of CSV
+ records
open (READFILE,"<s:\\RFax-L7.txt");
# Establish an array @lines and populate it with the lines in the data
+file
my @lines = <READFILE>;
# Open two more files to write to keepers in SORTED rejects in REJECT
open SORTED,">","s:\\sorted.txt" or die "Couldn't open SORTED.TXT
+file: $!\n";
open REJECT,">","s:\\reject.txt" or die "Couldn't open REJECT.TXT
+file: $!\n";
# preprocess $lines to remove embedded commas in quoted fields
# The following line worked when I declared the contents of $::lines a
+s "Aaron,\"1234 Main St, USA\",555-555-1212"
# it produced the output "Aaron,"1234 Main St USA",555-555-1212"
# but when I moved it into this program incorporating it into the loop
+ it doesn't work!
foreach $::lines (@lines) {
$::lines =~ s/("*),(,")/\$1\$2/g;
# Import each line into an array seperating by the commmas.
@::field = split(/,/, $::lines);
# Test to see if the 10th field contains "32", these we keep
if ($::field[9] != 32) {
print REJECT "* rejected not 32 * $::lines";
print "* REJECTED * $::lines";
} else {
print SORTED "$::lines";
print "\$ KEEPER ==> $::lines";
}
}
close READFILE;
close SORTED;
close REJECT;
I managed to find a code snippet online which seemed to work when using a static $text value defined within the program.
# remove commas from quoted text strings;
$text = "Aaron,\"1223 Main St, USA\",555-555-1212";
print " INBOUND \$text is equal to: $text\n";
$text =~ s/("[\w\ ]*),([\w\ ]+")/\1\2/g;
print " OUTBOUND \$text is equal to: $text";
# This produced output as follows:
# INBOUND $text is equal to: Aaron,"1234 Main St, USA",555-555-1212
# OUTBOUND $text is equal to: Aaron,"1234 Main St USA",555-555-1212
I tried to move the s/// line into my earlier program as a pre-processing step to remove any commas from within "quoted" string values of $text. But in the first program it fails.
I have the Text-CSV_XS package but can't figure out how to use it. I would greatly appreciate getting some direction from more experienced PERL programers. Thanks for taking a look.