Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

Re: Regular expressions and accents

by davido (Cardinal)
on Dec 21, 2004 at 17:12 UTC ( #416545=note: print w/replies, xml ) Need Help??


in reply to Regular expressions and accents

If you're using locales (which you probably are, if you're dealing with accented characters), Perl's regular expression system is smart enough (usually) to include those accented characters in the \w metacharacter class. You can use this to your advantage. Here's what you need to match:

Any character that is not a nonword character, that is not a-z or A-Z nor a numeric digit, nor underscore. That's a mouthful, but here's how it's written:

print "$character\n" if $character =~ m/[^\Wa-zA-Z\d_]/;

That looks a little ugly, so here's a POSIX version that looks cleaner:

print "$character\n" if $character =~ m/[^[:^alpha:]a-zA-Z]/;

These solutions are not thoroughly tested, as I'm currently sitting at an older operating system that doesn't have much in the way of locale support.


Dave

Replies are listed 'Best First'.
Re^2: Regular expressions and accents
by ysth (Canon) on Dec 21, 2004 at 19:01 UTC
    If you're using locales
    Or have marked your data as unicode:
    $ perl -we'$x = "\xff"; print 0 + $x =~ /\w/' 0 $ perl -we'$x = "\xff"; utf8::upgrade($x); print 0 + $x =~ /\w/' 1

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://416545]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chilling in the Monastery: (4)
As of 2022-05-26 05:52 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Do you prefer to work remotely?



    Results (93 votes). Check out past polls.

    Notices?