note
choroba
Two days ago, I was having the same issues. My testing script is a bit more elaborate, but confirms your problem: <c>\w</c> does not work under <c>use locale</c>, whatever the locale is. If you need locale, use posix classes or unicode character properties (provided your locale is unicode based).
<br>
Discussion on the details can be found in [doc://perlunicode] and [doc://perllocale], but the relevant paragraphs are quite different in each version of Perl.
<c>
#!/usr/bin/perl
use warnings;
use strict;
my @p_locale = qw/C cs_CZ.UTF8 en_US.UTF8/;
my @locale = ('no locale', 'use locale');
my @posix = (q(),
'use POSIX qw/locale_h/; setlocale LC_ALL,"C"',
'use POSIX qw/locale_h/; setlocale LC_ALL,"cs_CZ.UTF8"',
);
for my $p_locale (@p_locale) {
for my $locale (@locale) {
for my $posix (@posix) {
print "$p_locale $locale $posix\n";
open my $OUT, '>', 'l.perl' or die "$!";
print {$OUT} << " OUT";
$locale;
$posix;
my \@chars = qw/283 269 345 225
32
98 99 48 49 50 51 52
32
353 253 382 237/;
my \$string = join q[], map chr, \@chars;
binmode STDOUT, ':utf8';
my \@regex = (qr/(\\w+)/, qr/([[:alnum:]]+)/, qr/(\\p{Word}+)/);
for my \$regex (\@regex) {
print "\t\$1" while \$string =~ /\$regex/g;
print "\n";
}
print sort qw/ci ch/;
print "\n";
OUT
close $OUT;
$ENV{LC_ALL} = $p_locale;
system 'perl', 'l.perl';
}
}
}
unlink 'l.perl';
</c>
935400
935400