Parsing MS SQL CSV export with Text::CSV

andyford has asked for the wisdom of the Perl Monks concerning the following question:

I'm trying to parse some CSV data from a MS SQL dump, but it seems to contain some stuff that I can't see. I tried Text::CSV_XS first, but the parser can't even see the lines apparently. The following just gives no results at all.

use strict;
use warnings;
use Text::CSV_XS;
use Data::Dumper;

my $tr_csv = Text::CSV_XS->new({ binary => 1, eol => $/ });
open(my $tr,"<",'file.csv') or die "Failure opening data file: $!";
while (my $row = $tr_csv->getline($tr)) {
    print $tr_csv->status();
    print @$row;
    print Data::Dumper->Dump($row);
}
[download]

I tried to step back and just split the lines like so

use strict;
use warnings;
open my $tr,"<",'file.csv' or die "Failure opening data file: $!";
while (<$tr>) {
    my ($x,$y,$z) = split /,/;
    print "x: $x\n";
}
[download]

The result there is that a couple unexpected characters (a y-umlaut and a thing that looks kinda like a "p") show up at the beginning of just the first line, even though the data has nothing but ASCII in it. Also, the while loop runs one too many times, as though the file has an empty line at the end, but it doesn't.

x: яюDARK01DGBBHF1D
x: DARK01JDM0HF1D
x: 191WA357Z1F811
x: 1952AF2-L3A3567
x:
[download]

I tried to view the non-printing stuff with "set list" in vim, but that just shows the normal "$" eol characters, just one at the end of each line.

Update: solution! Brute force: strip the first two chars and remove all the CR's:

use strict;
use warnings;
use Text::CSV_XS;
use File::Copy;

my $tr_csv = Text::CSV_XS->new({ binary => 1, eol => $/ });
open my $tmp,'>','tmp.csv' or die "Failure opening temp file: $!";
open my $tr,'<','file.csv' or die "Failure opening data file: $!";
while (<$tr>) {
    if ($. == 1) {
        $_ = substr $_,2;
    }
    s/\c@//g;
    print $tmp $_;
}
close $tmp or die "Failure closing temp file: $!";
close $tr or die "Failure closing data file: $!";
move 'tmp.csv','file.csv';
open $tr,'<','file.csv' or die "Failure opening data file: $!";
while (my $row = <$tr>) {
    $tr_csv->parse($row);
    my ($x,$y,$z) = split /,/,$row;
    print "x: $x\n";
}
[download]

non-Perl: read my music blog be my fan at Reverbnation bump me at the Sixty-One

Comment on Parsing MS SQL CSV export with Text::CSV_XS Select or Download Code

Replies are listed 'Best First'.
Re: Parsing MS SQL CSV export with Text::CSV_XS by Limbic~Region (Chancellor) on Oct 22, 2008 at 19:19 UTC
andyford, I have seen this very problem and I am willing to bet money you are wrong - there are characters at the beginning of the file that aren't not displayed however you are looking at it. Try this to see for sure: `#!/usr/bin/perl use strict; use warnings; open(my $fh, '<', 'file.csv') or die $!; { local $/ = \1; for (1 .. 10) { # first 10 bytes of the file my $byte = <$fh>; print join("\t", ord($byte), $byte), "\n"; } }` [download] Cheers - L~R	[reply] [d/l]
Re^2: Parsing MS SQL CSV export with Text::CSV_XS by andyford (Curate) on Oct 22, 2008 at 19:33 UTC
Quite so, there are other characters. I should have said, "the DATA only has ASCII in it." Nice, I get `255 я 254 ю 68 D 0 65 A 0 82 R 0 75 K 0` [download] Maybe I'll try just removing the first two characters. Update: Add second sentence for clarification non-Perl: read my music blog be my fan at Reverbnation bump me at the Sixty-One	[reply] [d/l]
Re^3: Parsing MS SQL CSV export with Text::CSV_XS by Limbic~Region (Chancellor) on Oct 22, 2008 at 20:09 UTC
andyford, If I remember correctly, there was a way to tell the database to not output those two bytes but I can't remember how. I vaguely recall it had something to do with not telling it you were doing CSV but rather text or perhaps it was just changing the extension from .csv to .txt. Unfortunately, the problem was from a customer providing the data and they could never be bothered to do it consistently so I end up writing something that tested the first two bytes and only stripping them if they were ord() > 127. Cheers - L~R	[reply]
Re^4: Parsing MS SQL CSV export with Text::CSV_XS by andyford (Curate) on Oct 22, 2008 at 21:49 UTC
Re^5: Parsing MS SQL CSV export with Text::CSV_XS by ig (Vicar) on Oct 22, 2008 at 22:10 UTC
Re^3: Parsing MS SQL CSV export with Text::CSV_XS by procura (Beadle) on Oct 27, 2008 at 11:03 UTC
You have an encoding problem `open my $fh, "<:encoding(utf16)", "file.csv";` See for prove: `$ od -t x1 xx.csv 0000000 ff fe 44 00 41 00 52 00 4b 00 0a 00 $ perl -we'open $a, "<:encoding(utf16)", "xx.csv" and print <$a>' \| od + -t x1 0000000 44 41 52 4b 0a $` [download]	[reply] [d/l] [select]
Re: Parsing MS SQL CSV export with Text::CSV_XS by Tux (Canon) on Oct 22, 2008 at 19:16 UTC
If you suspect errors, don't use `->status ()`, but `->error_diag ()`. If you expect weird characters, you can use my new module Data::Peek: `use Text::CSV_XS; use Data::Peek; my $csv = Text::CSV_XS->new ({ binary => 1 }); $csv->parse (qq{"ab\nc\x{20ac}\x80"}); print DPeek, "\n" for $csv->fields; $csv->parse (qq{"ab\nc\x{20ac}\x80",\r"\n}); $csv->error_diag;` [download] Will return `PV("ab\nc\342\202\254\302\200"\0) [UTF8 "ab\nc\x{20ac}\x{80}"] # CSV_XS ERROR: 2031 - EIF - CR char is first char of field, not part +of EOL` [download] Enjoy, Have FUN! H.Merijn	[reply] [d/l] [select]
Re^2: Parsing MS SQL CSV export with Text::CSV_XS by andyford (Curate) on Oct 22, 2008 at 20:04 UTC
You got me thinking. If I take out the getline and put the parse in the loop and add your error_diag(), `use strict; use warnings; use Text::CSV_XS; my $tr_csv = Text::CSV_XS->new({ binary => 1, eol => $/ }); open(my $tr,"<",'file.csv') or die "Failure opening ERS data file: $!" +; while (my $row = <$tr>) { $tr_csv->parse($row); print $tr_csv->error_diag(),"\n"; }` [download] I can get some errors. `2032EIF - CR char inside unquoted, not part of EOL436 2032EIF - CR char inside unquoted, not part of EOL435 2032EIF - CR char inside unquoted, not part of EOL433 2032EIF - CR char inside unquoted, not part of EOL425 2032EIF - CR char inside unquoted, not part of EOL425` [download] I guess that means it's worse than just having two extra characters at the beginning. non-Perl: read my music blog be my fan at Reverbnation bump me at the Sixty-One	[reply] [d/l] [select]
Re: Parsing MS SQL CSV export with Text::CSV_XS by jfroebe (Parson) on Oct 22, 2008 at 18:50 UTC
How exactly are you dumping the data in the first place? Are you using bcp? If so, what parameters are you using? Jason L. Froebe Blog, Tech Blog	[reply]
Re^2: Parsing MS SQL CSV export with Text::CSV_XS by andyford (Curate) on Oct 22, 2008 at 18:59 UTC
The only access method I was given was some bloated thing called "SQL Server Management Studio". I'm hoping to do something automate-able eventually but I thought it would be OK to get started mocking up a report. non-Perl: read my music blog be my fan at Reverbnation bump me at the Sixty-One	[reply]
Re: Parsing MS SQL CSV export with Text::CSV_XS by Illuminatus (Curate) on Oct 22, 2008 at 19:06 UTC
Emacs does a better job of displaying 'odd' characters than vim. You could also use 'od -h -a' to verify what you actually have. Can you import the file using Excel?	[reply]
Re^2: Parsing MS SQL CSV export with Text::CSV_XS by andyford (Curate) on Oct 22, 2008 at 19:27 UTC
I actually tried Excel and it gave great results, so at least MS can read the MS output, but then Excel told me it could only display the first 65,536 lines (I have ~160,000). Gave me a chuckle anyway. The od output looks intriguing: `0000000 feff 0044 0041 0052 004b 0030 0031 0044 del ~ D nul A nul R nul K nul 0 nul 1 nul D nu +l` [download] I guess the "del" and "~" are some other interpretation of my y-umlaut and p-thingy? non-Perl: read my music blog be my fan at Reverbnation bump me at the Sixty-One	[reply] [d/l]

Back to Seekers of Perl Wisdom