http://qs321.pair.com?node_id=1191450


in reply to DBD::CSV - how to I coax it to read BOM prefixed files?

As I wrote in the CB, I think File::BOM might be the way to do it. I'm still searching, but at least I've a working example that shows the error:

#!/usr/bin/perl use v5.12; use warnings; use autodie qw( :all ); use File::BOM; use DBI; open my $h,'>:encoding(utf-8)','test.csv'; say $h qq<\x{FEFF}"foo","bar","baz">; say $h qq<"1","2","3">; say $h qq<"4","5","6">; close $h; my $dbh=DBI->connect('dbi:CSV:',undef,undef,{ RaiseError => 1, PrintEr +ror => 0, f_ext => '.csv'}); my $sth=$dbh->prepare('select * from test'); $sth->execute(); while (my @a=$sth->fetchrow_array()) { say join(",",@a); } $sth->finish();

Update:

It seems that only the header() method in Text::CSV_XS is able to handle the BOM, and DBD::CSV does not call that method. Instead, it calls getline().

#!/usr/bin/perl use v5.12; use warnings; use autodie qw( :all ); use File::BOM; use DBI; use Data::Dumper; package My::Text::CSV_XS { use parent "Text::CSV_XS"; sub DUMP { return Data::Dumper->new([\@_],['*_'])->Sortkeys(1)->I +ndent(1)->Useqq(1)->Dump(); } sub new { my $proto=shift; say "$proto -> new(",DUMP(@_),")"; $proto->SUPER::new(@_); } sub header { my $self=shift; say "$self -> header(",DUMP(@_),")"; $self->SUPER::header(@_); } sub getline { my $self=shift; say "$self -> getline(",DUMP(@_),")"; $self->SUPER::getline(@_); } sub getline_hr { my $self=shift; say "$self -> getline_hr(",DUMP(@_),")"; $self->SUPER::getline_hr(@_); } sub getline_all { my $self=shift; say "$self -> getline(",DUMP(@_),")"; $self->SUPER::getline_all(@_); } sub getline_hr_all { my $self=shift; say "$self -> getline_hr(",DUMP(@_),")"; $self->SUPER::getline_hr_all(@_); } } open my $h,'>:encoding(utf-8)','test.csv'; say $h qq<\x{FEFF}"foo","bar","baz">; say $h qq<"1","2","3">; say $h qq<"4","5","6">; close $h; my $dbh=DBI->connect('dbi:CSV:',undef,undef,{ RaiseError => 1, PrintEr +ror => 0, f_ext => '.csv', csv_class => 'My::Text::CSV_XS'}); my $sth=$dbh->prepare('select * from test'); $sth->execute(); while (my @a=$sth->fetchrow_array()) { say join(",",@a); } $sth->finish();
>perl test2.pl My::Text::CSV_XS -> new(@_ = ( { "auto_diag" => 1, "binary" => 1, "escape_char" => "\"", "quote_char" => "\"", "sep_char" => "," } ); ) My::Text::CSV_XS -> new(@_ = ( { "auto_diag" => 1, "binary" => 1, "eol" => "\r\n", "escape_char" => "\"", "quote_char" => "\"", "sep_char" => "," } ); ) My::Text::CSV_XS=HASH(0x23e04e8) -> getline(@_ = ( bless( \*Symbol::GEN1, 'IO::File' ) ); ) DBD::CSV::st execute failed: Execution ERROR: Missing first row due to EIF - Loose unescaped quote +at /usr/lib64/perl5/vendor_perl/DBI/DBD/SqlEngine.pm line 1480. . at /usr/lib64/perl5/vendor_perl/DBI/DBD/SqlEngine.pm line 1271. [for Statement "select * from test"] at test2.pl line 71, <GEN1> line + 1. >

Second Update:

There is an ugly way to get File::BOM into DBD::CSV. DBD::CSV inherits from DBD::File, and DBD::File has an f_encoding attribute that is simply wrapped in :encoding(...), without further processing. That string is then passed to binmode.

This is the relevant part of DBD::File:

sub apply_encoding { my ($self, $meta, $fn) = @_; defined $fn or $fn = "file handle " . fileno ($meta->{fh}); if (my $enc = $meta->{f_encoding}) { binmode $meta->{fh}, ":encoding($enc)" or croak "Failed to set encoding layer '$enc' on $fn: $!"; } else { binmode $meta->{fh} or croak "Failed to set binary mode on $fn: $! +"; } } # apply_encoding

So, with a carefully crafted encoding string, it works at least on my system:

#!/usr/bin/perl use v5.12; use warnings; use autodie qw( :all ); use File::BOM; use DBI; open my $h,'>:encoding(utf-8)','test.csv'; say $h qq<\x{FEFF}"foo","bar","baz">; say $h qq<"1","2","3">; say $h qq<"4","5","6">; close $h; my $dbh=DBI->connect( 'dbi:CSV:', undef, undef, { RaiseError => 1, PrintError => 0, f_ext => '.csv', f_encoding => 'utf-8):via(File::BOM' # <-- this is the + evil trick! } ); my $sth=$dbh->prepare('select * from test'); $sth->execute(); while (my @a=$sth->fetchrow_array()) { say join(",",@a); } $sth->finish();

Effectively, this calls binmode $meta->{fh}, ":encoding(utf-8):via(File::BOM)", and so, File::BOM can hide the BOM from Text::CSV_XS and DBD::CSV.

>perl test3.pl 1,2,3 4,5,6 >

Alexander

--
Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)

Replies are listed 'Best First'.
Re^2: DBD::CSV - how to I coax it to read BOM prefixed files?
by ELISHEVA (Prior) on Jun 04, 2017 at 10:14 UTC

    Worked like a charm. Thank-you

      Worked like a charm. Thank-you

      Glad I could help.

      BUT:

      Until Tux' change is released on CPAN, please do yourself a favour and add an explaining comment how that trick works. This trick is everything but obvious, and I doubt that it will work for any combination of the relevant modules. Don't make your future self invent a time machine to come back and hurt you for playing dirty tricks.

      When the change is released, test it, get rid of my dirty trick, and explicitly require the new version of DBD::CSV.

      If your code has to work with "any" version of DBD::CSV, check its version, prefer to use the Tux' official csv_bom attribute, and only if it is not supported, fall back to use the File::BOM trick. Something like this:

      use DBD::CSV (); use if $DBD::CSV::VERSION < 0.50, File::BOM => (); # ^---- adjust to working version number my $dbh=DBI->connect( 'dbi:CSV:', undef, undef, { RaiseError => 1, PrintError => 0, f_ext => '.csv', ( # v---- adjust to working version numbe +r $DBD::CSV::VERSION < 0.50 ? (f_encoding => 'utf-8):via(File::BOM') : (csv_bom => 1) ) } );

      Alternatively, patch DBD::CSV at runtime if it is too old, and use Tux' official attribute csv_bom even with old versions:

      #!/usr/bin/perl use v5.12; use warnings; use autodie qw( :all ); use DBI; use DBD::CSV (); # v----- adjust to working version if ($DBD::CSV::VERSION < 0.50) { require File::BOM; my $old_connect=\&DBD::CSV::dr::connect; my $new_connect=sub { my ($drh, $dbname, $user, $auth, $attr) = @_; if ($attr && exists($attr->{'csv_bom'}) && $attr->{'cs +v_bom'}) { delete $attr->{'csv_bom'}; $attr->{'f_encoding'}='utf-8):via(File::BOM'; } goto $old_connect; }; do { no strict 'refs'; no warnings 'redefine'; *DBD::CSV::dr::connect=$new_connect; }; } open my $h,'>:encoding(utf-8)','test.csv'; say $h qq<\x{FEFF}"foo","bar","baz">; say $h qq<"1","2","3">; say $h qq<"4","5","6">; close $h; my $dbh=DBI->connect( 'dbi:CSV:', undef, undef, { RaiseError => 1, PrintError => 0, f_ext => '.csv', csv_bom => 1, } ); my $sth=$dbh->prepare('select * from test'); $sth->execute(); while (my @a=$sth->fetchrow_array()) { say join(",",@a); } $sth->finish();

      Alexander

      --
      Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)