http://qs321.pair.com?node_id=376821


in reply to problems matching umlauts in env vars

You need to define a locale that contains ä/ö/ü for \w to include them. You need to do this even for UTF-8. UTF-8 is just a standard way of representing characters, not the set of characters that can make up words in a particular language.

use locale; use POSIX 'locale_h'; my $loc = 'de_DE.utf8'; # German locale, for example. Run 'locale -a' + to get the exact locale name setlocale(LC_CTYPE, $loc) or die "Invalid locale $loc";

Either that, or use this little trick off of my home node: [A-Za-zÀ-ÿœŒ] instead of \w :)

I probably should add that the German locale will likely not match 'ë', since it does not exist in German. Maybe Dutch or French...

--
Damon Allen Davison
http://www.allolex.net

Replies are listed 'Best First'.
Re^2: problems matching umlauts in env vars
by december (Pilgrim) on Aug 02, 2004 at 04:37 UTC

    Thanks for your reply. I have set the locale now, and that solves at least this problem.

    German locale should be using the iso-8859-1 (or rather iso-8859-15) charset, which does contain an e with umlauts. Standard French language doesn't have umlauts, but Dutch (my native language) does. Either way, all Western European countries use the same charset, which should be iso-8859-15 (that's latin1 plus euro).

    The problem now is that I don't know which charset will be given to me in the request... Could be pretty much anything.