Why does Encode::Repair only correctly fix one of these two tandem characters?by Jim (Curate)
|on Aug 08, 2014 at 23:04 UTC||Need Help??|
Jim has asked for the wisdom of the Perl Monks concerning the following question:
The function Encode::Repair::repair_double fixes the character U+201C LEFT DOUBLE QUOTATION MARK when double-encoded but not its companion character U+201D RIGHT DOUBLE QUOTATION MARK when double-encoded. Is there a bug in the module or a defect in my expectations? Or is something else wrong?
Here's a script that demonstrates the problem:
Here's the output of the script piped through od:
E2 80 9C is the correct UTF-8 encoding of the Unicode character U+201C LEFT DOUBLE QUOTATION MARK.
EF BF BD is U+FFFD REPLACEMENT CHARACTER and 3F is U+003F QUESTION MARK. I expect the output to be the single Unicode character U+201D RIGHT DOUBLE QUOTATION MARK instead.