I've been trying for a real SSCCE. Here's one more try: When I fetch one of the source files using 'curl' directly to a file, and then import that file using Emacs, whittle it down to a few letters, like in the following, then I get the output $VAR1 = "t\x{f3}n";. That does not look like UTF-8 to me.
#!/usr/bin/perl
use utf8;
use Data::Dumper;
use warnings;
use strict;
my $a = "tón";
print Dumper($a),qq(\n);
Is there a standard way to identify 8-bit, legacy text (which has been mislabeled upstream as UTF-8) and convert it into UTF-8 for continued work with regex?
|