Matching alphabetic characters

Miguel has asked for the wisdom of the Perl Monks concerning the following question:

Esteemed monks,

What am I missing here:

#!/usr/bin/perl -w

use strict;

# Also tried with this, but with no better results
use POSIX qw/locale_h/;
setlocale(LC_CTYPE, "pt_PT.ISO8859-1");

my @words = qw/ه q فكه abc/;

foreach (@words) {
    $_ =~/[[:alpha:]]/ 
        ? print "$_ : OK!\n" 
        : print "$_ : NOT OK!\n";

    $_ =~/\w+/ 
        ? print "$_ : OK!\n" 
        : print "$_ : NOT OK!\n";
}
[download]

Output:

ه : NOT OK!
ه : NOT OK!
q : OK!
q : OK!
فكه : NOT OK!
فكه : NOT OK!
abc : OK!
abc : OK!
[download]

Why aren't "ه", "ك" and "ف" being recognized as valid alphabetic characters? How can I validate a word containing such characters?

Thank your for your attention,
Miguel

Comment on Matching alphabetic characters Select or Download Code

Replies are listed 'Best First'.
Re: Matching alphabetic characters by wazoox (Prior) on Mar 11, 2006 at 17:53 UTC
It seems you forget `use locale`. This works fine for me: `#!/usr/bin/perl use strict; use warnings; use locale; use POSIX qw/locale_h/; setlocale(LC_CTYPE, "fr_FR.ISO8859-1"); my @words = qw/ىوه q ىـه abc/; foreach (@words) { $_ =~/[[:alpha:]]/ ? print "$_ : OK!\n" : print "$_ : NOT OK!\n"; $_ =~/\w+/ ? print "$_ : OK!\n" : print "$_ : NOT OK!\n"; }` [download]	[reply] [d/l] [select]
Re: Matching alphabetic characters by wfsp (Abbot) on Mar 11, 2006 at 17:47 UTC
This worked ok for me: `#!/usr/bin/perl -w use strict; # Also tried with this, but with no better results #use POSIX qw/locale_h/; #setlocale(LC_CTYPE, "pt_PT.ISO8859-1"); use locale; my @words = qw/ه q فكه abc/; foreach (@words) { $_ =~/[[:alpha:]]/ ? print "$_ : OK!\n" : print "$_ : NOT OK!\n"; $_ =~/\w+/ ? print "$_ : OK!\n" : print "$_ : NOT OK!\n"; } __DATA__ Output: ه : OK! ه : OK! q : OK! q : OK! فكه : OK! فكه : OK! abc : OK! abc : OK!` [download]	[reply] [d/l]
Re: Matching alphabetic characters by clinton (Priest) on Mar 12, 2006 at 02:06 UTC
In my limited experience of locales, it is a bit of a nightmare. Especially when you want to have a multilingual websites where different characters are considered to be letters in different languags. Also, when you use locale, you can't use \w to untaint input, because locale files are considered to be external to your script. Anything to stop you making the leap to UTF8? There is generally good support available for it now, and it handles multilingual sorting, etc etc. I've found it the easiest standard to apply when developing web apps.	[reply]


good chemistry is complicated, and a little bit messy -LW
	PerlMonks