There's more than one way to do things | |
PerlMonks |
Re: Safely removing Unicode zero-width spaces and other non-printing charactersby ikegami (Patriarch) |
on Dec 05, 2019 at 14:54 UTC ( [id://11109696]=note: print w/replies, xml ) | Need Help?? |
For starters, U+00A0 is not a zero-width space; it's a (normal-width) non-breaking space. Furthermore, as a normal-width space, it isn't a non-printing character. That is to say, it is printing character. On to your question. To remove NBSP and non-printing characters, you can use the following:
(In lieu of \N{NBSP}, once can use \xA0 or \x{A0} or \N{U+A0} or ...) The above expects Unicode characters (decoded text). You are providing encoded text instead (bytes). You need to properly decode your inputs and encode your outputs. For example, if your source code is encoded using UTF-8 rather than ASCII, you want:
For example, the following causes STDIN, STDOUT and STDERR to be decoded/encoded automatically, and it sets the default encoding for files opened in scope:
Failing to properly decode your inputs and encode your outputs explains the results you are seeing.
In Section
Seekers of Perl Wisdom
|
|