http://qs321.pair.com?node_id=718823

andyford has asked for the wisdom of the Perl Monks concerning the following question:

I'm trying to parse some CSV data from a MS SQL dump, but it seems to contain some stuff that I can't see. I tried Text::CSV_XS first, but the parser can't even see the lines apparently. The following just gives no results at all.

use strict; use warnings; use Text::CSV_XS; use Data::Dumper; my $tr_csv = Text::CSV_XS->new({ binary => 1, eol => $/ }); open(my $tr,"<",'file.csv') or die "Failure opening data file: $!"; while (my $row = $tr_csv->getline($tr)) { print $tr_csv->status(); print @$row; print Data::Dumper->Dump($row); }

I tried to step back and just split the lines like so

use strict; use warnings; open my $tr,"<",'file.csv' or die "Failure opening data file: $!"; while (<$tr>) { my ($x,$y,$z) = split /,/; print "x: $x\n"; }
The result there is that a couple unexpected characters (a y-umlaut and a thing that looks kinda like a "p") show up at the beginning of just the first line, even though the data has nothing but ASCII in it. Also, the while loop runs one too many times, as though the file has an empty line at the end, but it doesn't.
x: ˙ūDARK01DGBBHF1D x: DARK01JDM0HF1D x: 191WA357Z1F811 x: 1952AF2-L3A3567 x:
I tried to view the non-printing stuff with "set list" in vim, but that just shows the normal "$" eol characters, just one at the end of each line.

Update: solution! Brute force: strip the first two chars and remove all the CR's:

use strict; use warnings; use Text::CSV_XS; use File::Copy; my $tr_csv = Text::CSV_XS->new({ binary => 1, eol => $/ }); open my $tmp,'>','tmp.csv' or die "Failure opening temp file: $!"; open my $tr,'<','file.csv' or die "Failure opening data file: $!"; while (<$tr>) { if ($. == 1) { $_ = substr $_,2; } s/\c@//g; print $tmp $_; } close $tmp or die "Failure closing temp file: $!"; close $tr or die "Failure closing data file: $!"; move 'tmp.csv','file.csv'; open $tr,'<','file.csv' or die "Failure opening data file: $!"; while (my $row = <$tr>) { $tr_csv->parse($row); my ($x,$y,$z) = split /,/,$row; print "x: $x\n"; }