is there any way to automagically determine what encoding a file is?
That's precisely what the BOM ("byte order mark") is for. If, when creating files, you don't specify a byte order, Perl will create a BOM for you (otherwise, the file will be "BOM-less"). Files created that way (without explicit byte order) can be read by using plain :encoding(utf16):
$ /usr/bin/perl
use strict;
use warnings;
my $c = 'a';
my $fd;
open $fd, '>:encoding(utf16le)', 'foo-le' or die "open: $!";
print $fd $c;
close $fd;
open $fd, '>:encoding(utf16be)', 'foo-be' or die "open: $!";
print $fd $c;
close $fd;
open $fd, '>:encoding(utf16)', 'foo' or die "open: $!";
print $fd $c;
close $fd;
__END__
$ xxd foo-le
0000000: 6100 a.
$ xxd foo-be
0000000: 0061 .a
$ xxd foo
0000000: feff 0061 ...a
$ /usr/bin/perl
open my $fd, '<:encoding(utf16)', 'foo' or die "open: $!";
print while <$fd>;
close $fd;
__END__
a
Update: Of course, I realized after clicking in "Create" that I really didn't answer your actual question :^). Well, if files don't have a BOM, you can only guess or brute-force them. Or add a BOM to them ;^). .
|