Re: Safely removing Unicode zero-width spaces and other non-printing characters

It sounds to me like perhaps some of your strings were not decoded properly when you loaded them into Perl. Note that you can still provide an SSCCE: at the very least inspect your strings (and post them here) using Data::Dumper with $Data::Dumper::Useqq=1; or with Data::Dump, or even better, use hexdump or od to show your input files, and Devel::Peek for the strings; I gave an example here. As for posting on PerlMonks, you can post Unicode as long as you put it in <pre> instead of <code> tags (you'll have to escape <, >, and & manually though).

I do not have a "use utf8;" in place in this script, because if I add it, then it screws up nearly all the UTF-8 characters

That's strange, since utf8 only affects how your source code is interpreted. If you have any non-ASCII characters in your source, then I'd strongly recommend to make sure the file is properly encoded as UTF-8 and then use utf8;. To look at the source file and verify its encoding, you might also be interested in my script enctool.

And as kcott said, this also may depend on the Perl version you're using, for example, there's The 'unicode_strings' feature.

Comment on Re: Safely removing Unicode zero-width spaces and other non-printing characters Select or Download Code

Replies are listed 'Best First'.
Re^2: Safely removing Unicode zero-width spaces and other non-printing characters by mldvx4 (Friar) on Dec 04, 2019 at 09:30 UTC
Ah. Only one of the scripts has some non-ASCII in its output. In fact it is the one script which seems to be the trouble. The bitbucket links leads to a blank page though. Is there another site?	[reply]
Re^3: Safely removing Unicode zero-width spaces and other non-printing characters by haukex (Archbishop) on Dec 04, 2019 at 19:11 UTC
The bitbucket links leads to a blank page though. Is there another site? It works fine on my end, but try this link instead.	[reply]


Pathologically Eclectic Rubbish Lister
	PerlMonks