Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

Matching alphabetic characters

by Miguel (Friar)
on Mar 11, 2006 at 17:36 UTC ( [id://535976]=perlquestion: print w/replies, xml ) Need Help??

Miguel has asked for the wisdom of the Perl Monks concerning the following question:

Esteemed monks,

What am I missing here:

#!/usr/bin/perl -w use strict; # Also tried with this, but with no better results use POSIX qw/locale_h/; setlocale(LC_CTYPE, "pt_PT.ISO8859-1"); my @words = qw/ç q áãç abc/; foreach (@words) { $_ =~/[[:alpha:]]/ ? print "$_ : OK!\n" : print "$_ : NOT OK!\n"; $_ =~/\w+/ ? print "$_ : OK!\n" : print "$_ : NOT OK!\n"; }
Output:
ç : NOT OK! ç : NOT OK! q : OK! q : OK! áãç : NOT OK! áãç : NOT OK! abc : OK! abc : OK!
Why aren't "ç", "ã" and "á" being recognized as valid alphabetic characters? How can I validate a word containing such characters?

Thank your for your attention,
Miguel

Replies are listed 'Best First'.
Re: Matching alphabetic characters
by wazoox (Prior) on Mar 11, 2006 at 17:53 UTC
    It seems you forget  use locale. This works fine for me:
    #!/usr/bin/perl use strict; use warnings; use locale; use POSIX qw/locale_h/; setlocale(LC_CTYPE, "fr_FR.ISO8859-1"); my @words = qw/éèç q éàç abc/; foreach (@words) { $_ =~/[[:alpha:]]/ ? print "$_ : OK!\n" : print "$_ : NOT OK!\n"; $_ =~/\w+/ ? print "$_ : OK!\n" : print "$_ : NOT OK!\n"; }
Re: Matching alphabetic characters
by wfsp (Abbot) on Mar 11, 2006 at 17:47 UTC
    This worked ok for me:
    #!/usr/bin/perl -w use strict; # Also tried with this, but with no better results #use POSIX qw/locale_h/; #setlocale(LC_CTYPE, "pt_PT.ISO8859-1"); use locale; my @words = qw/ç q áãç abc/; foreach (@words) { $_ =~/[[:alpha:]]/ ? print "$_ : OK!\n" : print "$_ : NOT OK!\n"; $_ =~/\w+/ ? print "$_ : OK!\n" : print "$_ : NOT OK!\n"; } __DATA__ Output: ç : OK! ç : OK! q : OK! q : OK! áãç : OK! áãç : OK! abc : OK! abc : OK!
Re: Matching alphabetic characters
by clinton (Priest) on Mar 12, 2006 at 02:06 UTC
    In my limited experience of locales, it is a bit of a nightmare. Especially when you want to have a multilingual websites where different characters are considered to be letters in different languags.

    Also, when you use locale, you can't use \w to untaint input, because locale files are considered to be external to your script.

    Anything to stop you making the leap to UTF8?

    There is generally good support available for it now, and it handles multilingual sorting, etc etc.

    I've found it the easiest standard to apply when developing web apps.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://535976]
Approved by planetscape
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others surveying the Monastery: (3)
As of 2024-04-25 09:21 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found