http://qs321.pair.com?node_id=1191450


in reply to DBD::CSV - how to I coax it to read BOM prefixed files?

As I wrote in the CB, I think File::BOM might be the way to do it. I'm still searching, but at least I've a working example that shows the error:

#!/usr/bin/perl use v5.12; use warnings; use autodie qw( :all ); use File::BOM; use DBI; open my $h,'>:encoding(utf-8)','test.csv'; say $h qq<\x{FEFF}"foo","bar","baz">; say $h qq<"1","2","3">; say $h qq<"4","5","6">; close $h; my $dbh=DBI->connect('dbi:CSV:',undef,undef,{ RaiseError => 1, PrintEr +ror => 0, f_ext => '.csv'}); my $sth=$dbh->prepare('select * from test'); $sth->execute(); while (my @a=$sth->fetchrow_array()) { say join(",",@a); } $sth->finish();

Update:

It seems that only the header() method in Text::CSV_XS is able to handle the BOM, and DBD::CSV does not call that method. Instead, it calls getline().

#!/usr/bin/perl use v5.12; use warnings; use autodie qw( :all ); use File::BOM; use DBI; use Data::Dumper; package My::Text::CSV_XS { use parent "Text::CSV_XS"; sub DUMP { return Data::Dumper->new([\@_],['*_'])->Sortkeys(1)->I +ndent(1)->Useqq(1)->Dump(); } sub new { my $proto=shift; say "$proto -> new(",DUMP(@_),")"; $proto->SUPER::new(@_); } sub header { my $self=shift; say "$self -> header(",DUMP(@_),")"; $self->SUPER::header(@_); } sub getline { my $self=shift; say "$self -> getline(",DUMP(@_),")"; $self->SUPER::getline(@_); } sub getline_hr { my $self=shift; say "$self -> getline_hr(",DUMP(@_),")"; $self->SUPER::getline_hr(@_); } sub getline_all { my $self=shift; say "$self -> getline(",DUMP(@_),")"; $self->SUPER::getline_all(@_); } sub getline_hr_all { my $self=shift; say "$self -> getline_hr(",DUMP(@_),")"; $self->SUPER::getline_hr_all(@_); } } open my $h,'>:encoding(utf-8)','test.csv'; say $h qq<\x{FEFF}"foo","bar","baz">; say $h qq<"1","2","3">; say $h qq<"4","5","6">; close $h; my $dbh=DBI->connect('dbi:CSV:',undef,undef,{ RaiseError => 1, PrintEr +ror => 0, f_ext => '.csv', csv_class => 'My::Text::CSV_XS'}); my $sth=$dbh->prepare('select * from test'); $sth->execute(); while (my @a=$sth->fetchrow_array()) { say join(",",@a); } $sth->finish();
>perl test2.pl My::Text::CSV_XS -> new(@_ = ( { "auto_diag" => 1, "binary" => 1, "escape_char" => "\"", "quote_char" => "\"", "sep_char" => "," } ); ) My::Text::CSV_XS -> new(@_ = ( { "auto_diag" => 1, "binary" => 1, "eol" => "\r\n", "escape_char" => "\"", "quote_char" => "\"", "sep_char" => "," } ); ) My::Text::CSV_XS=HASH(0x23e04e8) -> getline(@_ = ( bless( \*Symbol::GEN1, 'IO::File' ) ); ) DBD::CSV::st execute failed: Execution ERROR: Missing first row due to EIF - Loose unescaped quote +at /usr/lib64/perl5/vendor_perl/DBI/DBD/SqlEngine.pm line 1480. . at /usr/lib64/perl5/vendor_perl/DBI/DBD/SqlEngine.pm line 1271. [for Statement "select * from test"] at test2.pl line 71, <GEN1> line + 1. >

Second Update:

There is an ugly way to get File::BOM into DBD::CSV. DBD::CSV inherits from DBD::File, and DBD::File has an f_encoding attribute that is simply wrapped in :encoding(...), without further processing. That string is then passed to binmode.

This is the relevant part of DBD::File:

sub apply_encoding { my ($self, $meta, $fn) = @_; defined $fn or $fn = "file handle " . fileno ($meta->{fh}); if (my $enc = $meta->{f_encoding}) { binmode $meta->{fh}, ":encoding($enc)" or croak "Failed to set encoding layer '$enc' on $fn: $!"; } else { binmode $meta->{fh} or croak "Failed to set binary mode on $fn: $! +"; } } # apply_encoding

So, with a carefully crafted encoding string, it works at least on my system:

#!/usr/bin/perl use v5.12; use warnings; use autodie qw( :all ); use File::BOM; use DBI; open my $h,'>:encoding(utf-8)','test.csv'; say $h qq<\x{FEFF}"foo","bar","baz">; say $h qq<"1","2","3">; say $h qq<"4","5","6">; close $h; my $dbh=DBI->connect( 'dbi:CSV:', undef, undef, { RaiseError => 1, PrintError => 0, f_ext => '.csv', f_encoding => 'utf-8):via(File::BOM' # <-- this is the + evil trick! } ); my $sth=$dbh->prepare('select * from test'); $sth->execute(); while (my @a=$sth->fetchrow_array()) { say join(",",@a); } $sth->finish();

Effectively, this calls binmode $meta->{fh}, ":encoding(utf-8):via(File::BOM)", and so, File::BOM can hide the BOM from Text::CSV_XS and DBD::CSV.

>perl test3.pl 1,2,3 4,5,6 >

Alexander

--
Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)