comment on

You're spot on, Bethany. Thanks.

The byte \x9D is being converted to the Unicode character U+FFFD REPLACEMENT CHARACTER (EF BF BD) upstream. So the question now is: What's special about \x9D that isn't special about \x9C?* Hmm…

I added statements to the demonstration script to display a hex dump of the UTF-8 double-encoded bytes:

use charnames qw( :full );
use Encode qw( encode decode );
use Encode::Repair qw( repair_double );

binmode STDOUT, ':encoding(UTF-8)';

my $ldqm = "\N{LEFT DOUBLE QUOTATION MARK}";
my $rdqm = "\N{RIGHT DOUBLE QUOTATION MARK}";

$ldqm = encode('UTF-8', decode('Windows-1252', encode('UTF-8', $ldqm))
+);
$rdqm = encode('UTF-8', decode('Windows-1252', encode('UTF-8', $rdqm))
+);

say join ' ', map { sprintf '%02X', $_ } unpack 'C*', $ldqm;
say join ' ', map { sprintf '%02X', $_ } unpack 'C*', $rdqm;

say repair_double($ldqm, { via => 'Windows-1252' });
say repair_double($rdqm, { via => 'Windows-1252' });

__END__
[download]

C3 A2 E2 82 AC C5 93
C3 A2 E2 82 AC EF BF BD
“
��?

*UPDATE: The short answer to the question is that 9C is a defined character in the Windows-1252 character encoding ('ś') and 9D is not.

In reply to Re^2: Why does Encode::Repair only correctly fix one of these two tandem characters? by Jim
in thread Why does Encode::Repair only correctly fix one of these two tandem characters? by Jim

Are you posting in the right place? Check out Where do I post X? to know for sure.
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
Want more info? How to link or How to display code and escape characters are good places to start.


laziness, impatience, and hubris
	PerlMonks