Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

Re: Removing multibyte UTF-8 chars from strings

by Corion (Patriarch)
on Jan 10, 2022 at 18:18 UTC ( [id://11140333]=note: print w/replies, xml ) Need Help??


in reply to Removing multibyte UTF-8 chars from strings

You don't show us where the string is initialized.

If you have the string verbatim in your editor, you might want to save the file with the UTF-8 encoding and then use utf8; at the top. Personally, I prefer to use charnames ':full'; and then write the characters using \N{...} named escapes.

As for the replacement target, you also need to tell/show us where you get it from, and you need to tell Perl what encoding the string is in. Maybe/most likely, the string already is UTF-8 but Perl doesn't know it. Then you should tell it to Perl by using:

use Encode 'decode'; ... my $string = decode('UTF-8', $input_string); # Keep only what we want: $string =~ m!([a-zA-Z0-9]+)! or warn "Invalid/empty username in '$string'"; my $real_user = $1; # Remove stuff we don't want, especially the writing direction isolate +s: $string =~ s!\x{2066}|\x{2069}!!g;

Replies are listed 'Best First'.
Re^2: Removing multibyte UTF-8 chars from strings
by cormanaz (Deacon) on Jan 10, 2022 at 19:27 UTC
    Ya sorry, I was reading from a file and clipped the offending chars from that. The closing regex did the trick. Never heard of a "direction isolate."

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11140333]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having an uproarious good time at the Monastery: (3)
As of 2024-04-25 17:37 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found