Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

Re: Why does Encode::Repair only correctly fix one of these two tandem characters?

by ikegami (Patriarch)
on Aug 09, 2014 at 05:32 UTC ( [id://1096827]=note: print w/replies, xml ) Need Help??


in reply to Why does Encode::Repair only correctly fix one of these two tandem characters?

$ldqm = encode 'UTF-8', decode 'Windows-1252', encode 'UTF-8', $ldqm; $ldqm => 201C encode 'UTF-8' => E2 80 9C decode 'Windows-1252' => 00E2 20AC 0153 encode 'UTF-8' => C3 A2 E2 82 AC C5 93
$rdqm = encode 'UTF-8', decode 'Windows-1252', encode 'UTF-8', $rdqm; $rdqm => 201D encode 'UTF-8' => E2 80 9D decode 'Windows-1252' => 00E2 20AC ???? [error handling] => 00E2 20AC FFFD encode 'UTF-8' => C3 A2 E2 82 AC EF BF BD

Windows-1252 doesn't have a character defined for 9D, so when you decode('Windows-1252', "\x9D"), you do something irreversible. The following all result in C3 A2 E2 82 AC EF BF BD.

  • U+2001 EM QUAD
  • U+200D ZERO WIDTH JOINER
  • U+200F RIGHT-TO-LEFT MARK
  • U+2010 HYPHEN
  • U+201D RIGHT DOUBLE QUOTATION MARK

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1096827]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others examining the Monastery: (5)
As of 2024-04-18 05:47 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found