Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

Re: By the shine on my bald pate, I dislike this encoding stuff

by Anonymous Monk
on Mar 04, 2018 at 14:24 UTC ( [id://1210313]=note: print w/replies, xml ) Need Help??


in reply to By the shine on my bald pate, I dislike this encoding stuff

Strictly speaking, a file containing "\xA3" is not ASCII, since ASCII only consists of the characters from "\x00" to "\x7F". Maybe it's ISO Latin-1?

Also, your logic double-decodes the file. Assuming it is UTF-8, opening it '<:encoding(UTF-8)' decodes it, and then your decode() decodes it again.

My knee-jerk would be to apply Encode::Guess to the problem, since that way somebody else has worked out this mess for you, and since if you are going to convert the file to UTF-8 you need to know what its encoding currently is. If I just wanted to know if the file decoded as UTF-8 I might be lazy and do something like

open my $orderfile, '<:raw', $emailfile
    or return( @err, "Could not open $emailfile: $!" );
local $/ = undef;
my $filedata = <$orderfile>;
close $orderfile;
use Encode;
eval {
    decode( "utf-8", $filedata, Encode::FB_CROAK );
    1;
} or return( @err, "File was not encoded in UTF-8" );

One possible source of confusion in this horrible mess is that the ASCII encoding is a subset of the UTF-8 encoding, so technically there is no way to distinguish between a file encoded in ASCII and a file encoded in UTF-8

Replies are listed 'Best First'.
Re^2: By the shine on my bald pate, I dislike this encoding stuff
by Anonymous Monk on Mar 05, 2018 at 03:39 UTC
    Yep. Betcha the real problem is that the files which contain "non-ASCII characters" didn't use Unicode (UTF-8, UTF-16) to encode those characters, but instead used old-style code pages. But the program's logic assumes that it's Unicode without checking the entire file. I didn't see the OP ever describing what the nature of the "crash" actually is.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1210313]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others pondering the Monastery: (7)
As of 2024-04-26 07:59 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found