Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

Regex to trim non Ascii characters

by Yllar (Novice)
on Sep 26, 2015 at 10:46 UTC ( [id://1143087]=perlquestion: print w/replies, xml ) Need Help??

Yllar has asked for the wisdom of the Perl Monks concerning the following question:

I am working on Windows Environment, I want to trim all non-Ascii characters and want only ascii range characters,numbers and symbols.Please help

My Input Was :

This is a simple text just for test purpose only ascii text 12345678910-=[];'#/.,-! " £ $ % ^ & * ( ) _ + { }~@:<>?|–

Now I am using JSON to decode my input data which decodes it as follows:

This is a simple text just for test purpose only ascii text12345678910-=[];\'#/.,\\-!\"\u00A3$%^&*()_+{}~@:<>?| \u2013

Now I am sending this decoded data to my Program to replace this unicode(utf-8) and other non-ascii characters with space/or some printable characters(I mean i want to print only ascii range characters) So, I tried all of the following in perl.

use strict; use warnings; use JSON; use LWP::UserAgent; use utf8; #Due to some security reasons I am not mentioning the url,hope u under +stand my $ResRef = sendHTTPRequest($someurlRequest); my $string = $ResRef->decoded_content;#I used json decode to decode co +ntent my $string = transalte_replace($string); sub transalte_replace { my $string = shift; for($string) { s/\\u[0-9]+/1-/g; s/\\u[a-zA-Z0-9\+]*/2-/g; s/\\x\{[a-zA-Z0-9]*\}/3-/g; s/[^\p{ASCII}]/-/g; s/[^\u0000-\u007F]+/replace1/g; s/[^\x00-\x7F]+/rep/g; s/[^\p{ASCII}]/-/g; s/[^A-Za-z0-9\.,\?'""!@#\$%\^&\*\(\)-_=\+;:\<\>\/\\\|\}\{\[\]`\~ +]+/y/g; #s/[£]//g; s/[^\x20-\x7E]+/replace3/g; #s/\\u[0-9]+/2-/g; #s/\\x[a-z0-9]+/3-/g; #s/[^\x00-\x7F]/4-/g; } }

The output still is:

"This is a simple text just for test purpose only ascii text12345678910-=[];'#/.,\-!\"\x{a3}\$%^&*()_+{}~\@:?|\x{2013}";

Replies are listed 'Best First'.
Re: Regex to trim non Ascii characters
by trippledubs (Deacon) on Sep 26, 2015 at 11:09 UTC
Re: Regex to trim non Ascii characters
by Albannach (Monsignor) on Sep 26, 2015 at 18:02 UTC
    If all you want is a filter then perhaps consider tr, along the lines of tr/// vs s/// The question.

    --
    I'd like to be able to assign to an luser

Re: Regex to trim non Ascii characters
by Anonymous Monk on Sep 27, 2015 at 18:51 UTC

    You're calling the sub like so: $x = translate($x). But what does your subroutine return?

    Remember that my $x creates a fresh new variable. Operating on a local copy inside the subroutine does not change its arguments. Either you work on the passed value itself

    sub mutate { for (shift) { s/./x/g; } }
    or return the working copy
    sub translate { my $x = shift; $x =~ s/./x/g; return $x; }

    As for the ASCII, all the printable characters fall in a short range, so a simple tr/\040-\176//cd ought to do.

Re: Regex to trim non Ascii characters
by nikosv (Deacon) on Sep 27, 2015 at 17:45 UTC
    use re 'eval'; $a='12345678910-=[];\'#/.,\\-!\"\u00A3$%^&*()_+{}~@:<>?| \u2013'; $a=~ s/((\\u....)|(.))(?{ if (defined $2){'y'} elsif(defined $3) {'x'} + else{$1} })/$^R/xg; print $a;
    prints :
    xxxxxxxxxxxxxxxxxxxxxxxxxxyxxxxxxxxxxxxxxxxxxxy

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1143087]
Approved by ww
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others avoiding work at the Monastery: (2)
As of 2024-04-25 01:59 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found