Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

trouble with text::csv

by BernieC (Pilgrim)
on Dec 04, 2021 at 23:51 UTC ( [id://11139397]=perlquestion: print w/replies, xml ) Need Help??

BernieC has asked for the wisdom of the Perl Monks concerning the following question:

I'm having an odd problem. I have a little program to read a CSV file using Text::CSV.
#!/usr/bin/perl use v5.10 ; use strict; use warnings ; use Text::CSV ; print "DB FILE: " ; my $dbfile = <STDIN> ; chomp($dbfile) ; my $if; open ($if, "<", $dbfile) or die "Can't open $dbfile: $!\n" ; my $csv = Text::CSV->new() ; $csv->sep("\t") ; my $header = $csv->getline($if) ; say $header ; exit ;
When I run it I get:
D:\Desktop>testcsv DB FILE: alladdrs.csv Use of uninitialized value $header in say at D:\perl\testcsv.pl line 2 +2, <$if> line 1.
The header line in the file is:
Importance,"First Name","Middle Name","Last Name","Full Name",Company, +Department,"Job Title","Street (b.)","City (b.)","State (b.)","ZIP Co +de (b.)","Country/Region (b.)","Home Phone","Business Phone","Mobile +Phone","Business Phone 2","Business Phone 3","Business Phone 4","Busi +ness Fax","Business Web Page","Street (h.)","City (h.)","State (h.)", +"ZIP Code (h.)","Country/Region (h.)","Home Phone 2","Home Phone 3"," +Home Phone 4","Home Fax","Personal Web Page","Mobile Phone 2","Mobile + Phone 3","Mobile Phone 4",E-mail,"E-mail 2","E-mail 3","E-mail 4",x, +y,z,w,Office,Supervisor,Assistant,Salutation,Nickname,Gender,Spouse,B +irthday,Anniversary,Family,Hobbies,Specialty,Strengths,Personality,No +tes,"Custom 2","Custom 3","Custom 4","Custom 5","Custom 6","Custom 7" +,"Custom 8",Comment,Group,"Birthday Reminder On/Off","Anniversary Rem +inder On/Off"
with the proviso that the first three invisible characters in the file are "efbbbf" which I gather is some unicode magic. Could that be causing the problem? If so, since the file is actually 100% ISO-Latin {except for the first three characters :o)} is there some way to get perl and the text::csv to ignore them and read and parse the header record?

Replies are listed 'Best First'.
Re: trouble with text::csv
by tangent (Parson) on Dec 05, 2021 at 01:59 UTC
    You could try this:
    my $csv = Text::CSV->new; my $header = $csv->header($if, {detect_bom => 1}); my @column_names = $csv->column_names;

      *almost* correct: the return values for $csv->header is a list, not a scalar:

      my $csv = Text::CSV_XS->new; # Or Text::CSV my @hdr = $csv->header ($if, { detect_bom => 1 }); # No need for $csv->column_names, as that is implicit in ->header

      Personally I'd go for the easiest solution, where all of that is integrated:

      use Text::CSV_XS qw( csv ); my $aoh = csv (in => $dbfile, sep => "\t", bom => 1);

      If you need the order of the header later, you can keep that too:

      my $aoh = csv (in => $dbfile, sep => "\t", bom => 1, kh => \my @hdr);

      Enjoy, Have FUN! H.Merijn
        Thanks for the insights (and for the excellent software). I did dump the $header in my example and got a Text::CSV object, so didn't try the list assignment.
Re: trouble with text::csv
by LanX (Saint) on Dec 05, 2021 at 00:20 UTC
    See Byte order mark aka BOM for your 3 invisible characters.

    Your file is marked as utf8

    Update

    See what the module has to say about detecting BOMs with ->header

      Since I'm an ISO-latin kinda guy and I find unicode a bother. Why didn't this work?
      open ($if, "<", $dbfile) or die "Can't open $dbfile: $!\n" ; seek($if, 3, 0) ; [...]
      Shouldn't that have just skipped over the BOM and left me with a vanilla text file on $if for the CSV stuff to read? {it didn't :(} There seems to be some magic/nuisance going on that I don't understand.
        No idea what you mean with ISO Latin, there are many

        And how do you know? Do you even have any bytes >127?

        Please note that ASCII, utf8 and most other encodings are identical with bytes <128.

        It's perfectly possible to have an utf8 encoded file which is plain ASCII.

        Cheers Rolf
        (addicted to the Perl Programming Language :)
        Wikisyntax for the Monastery

        One other thing is that I don't use CSV to handle the headers -- my ONLY use of the CSV module is to read a line from the file and break its fields into an array. I know this is kinda dumb considering that if I were cleverer CSV could do most all of my work but it is really only about an eight line, simple program with CSV parsing the lines into fields. I'm tempted to abandon CSV entirely and just write a routine that will take a line of csv data and burst it directly

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://11139397]
Approved by LanX
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others rifling through the Monastery: (4)
As of 2024-04-24 20:51 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found