http://qs321.pair.com?node_id=649803


in reply to Re: Reading CSV Files Containing UTF8 Characters
in thread Reading CSV Files Containing UTF8 Characters

That was easy enough. Thanks! Of course it still doesn't work because some characters don't appear to be UTF-8.

I wrote this code to try to figure out what encoding it is, but for any file with the "special" characters I just get the "Didn't work" message.

#!/usr/bin/perl use strict; use warnings; use Encode::Guess; undef $/; # slurp on my $dir = '.'; if (@ARGV > 0) { $dir = $ARGV[0]; } opendir DIR, $dir or die "Can't opendir '.': $!\n"; my @files = grep /\.csv$/i, readdir(DIR); closedir DIR; Encode::Guess->add_suspects(qw(latin1 cp1252)); # What else? foreach my $file (@files) { open my $fh, "<:raw", "$dir/$file" or die "Can't open $!\n"; my $data = <$fh>; close $fh; my $enc = guess_encoding($data); if (ref $enc) { print "$file: " . $enc->name . "\n"; } else { print "Didn't work for: $file\n"; } } exit;
This file was generated on Windows by exporting from Outlook. The Windows is setup for American English, but the keyboard is Danish. :-/ All the files that DO work are reported with "ascii" encoding (as I expect).