Beefy Boxes and Bandwidth Generously Provided by pair Networks
Come for the quick hacks, stay for the epiphanies.
 
PerlMonks  

Removing wide characters

by TomG (Initiate)
on Dec 08, 2005 at 22:35 UTC ( [id://515402]=perlquestion: print w/replies, xml ) Need Help??

TomG has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks. I'm a beginner using perl. I used LWP::UserAgent to get some web pages. And I need to remove all the wide characters. From what I understand regexp doesn’t handle this. (Am I correct?) . Is there another way to do this? A quick and dirty solution will also be OK for me. Can someone help me with this? Many thanks, Tom

Replies are listed 'Best First'.
Re: Removing wide characters
by GrandFather (Saint) on Dec 08, 2005 at 23:42 UTC

    This may do what you want:

    use warnings; use strict; my $str = do {local $/ = ''; <DATA>}; print $str . "\n"; $str =~ s/[^\x00-\x7f]//g; print $str; __DATA__ This € that

    Prints:

    This € that This that

    DWIM is Perl's answer to Gödel
      Thank you very much Grandfather and BorgCopyeditor. Both solutions were helpful and worked fine. Tom

      Thanks a lot. This solved my problem. I was facing this problem from last 4 days.

      Very helpful... I found such a character in the column names of paypal transaction csvs... maddening until I found out just what was going on!
Re: Removing wide characters
by BorgCopyeditor (Friar) on Dec 08, 2005 at 23:07 UTC
    Assuming that by "wide characters," you mean "not ASCII," you could use \P{IsASCII} in your regexes. That said, you might have to do things differently depending on the charset in which the webpage is served.

    BCE
    --Your punctuation skills are insufficient!

Re: Removing wide characters
by Happy-the-monk (Canon) on Dec 08, 2005 at 22:43 UTC

    I need to remove all the wide characters. From what I understand regexp doesn’t handle this.

    I don't know what you mean by "wide characters".
    You could illuminate the point by showing us what you have tried with regexen, or why you believe they cannot do what you are trying to do.

    Cheers, Sören

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://515402]
Approved by Happy-the-monk
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others imbibing at the Monastery: (7)
As of 2024-03-28 19:50 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found