http://qs321.pair.com?node_id=570344

ropey has asked for the wisdom of the Perl Monks concerning the following question:

Hi guys, Had this issue which is somewhat foxing me... there are two stages to this issue.

Stage 1) I will take a input for a name from a user, I wish to apply a regex to validate that the name is what I would consider a name, so all chars A-Za-z, a space, a . etc... however I also need to accept accented characters like åäöüíìóòúùñÑéèáàêëûãç_öüæøÅÄÖÉØÁÜÆØâîÂÎ... I even tried this in the regex but sometimes it fails... I think this is something to do with how the file is saved... my test scripts dont work properly... so is there a better way of doing this ?

Stage 2) I need to replace the accented chars with non accented equivalent (as the mainframe they eventually end up in do not accept them. I have used several regex's like

my $t = shift; $t =~ s/(ä|Ä)/AE/g; #Ä $t =~ s/(ö|Ö)/OE/g; #Ö $t =~ s/(ü|Ü)/UE/g; #Ü $t =~ s/(ß)/SZ/g; #ß $t =~ s/(å|á|æ|â)/a/g; #ä|å|á|æ $t =~ s/(ø)/o/g; #ö $t =~ s/(é)/e/g; #é $t =~ s/(î)/i/g;#î $t =~ s/(Î)/I/g;#Î $t =~ s/(Å|Á|Æ|Â)/A/g;#Å|Ä|Á|Æ $t =~ s/(Ö|Ø)/O/g; $t =~ s/[^a-z0-9\,\.\s\/\-\@\:]//ig; return $t;

Any tips in solving this greatly appreciated.

Replies are listed 'Best First'.
Re: Matching Accented Names
by explorer (Chaplain) on Aug 30, 2006 at 09:39 UTC
Re: Matching Accented Names
by planetscape (Chancellor) on Aug 30, 2006 at 13:15 UTC
Re: Matching Accented Names
by cdarke (Prior) on Aug 30, 2006 at 10:19 UTC
    It appears that some characters are not within ISO Latin 1, so the problem might not be so easily solved. You may need to know which character set/locale they are coming from, there could be some overlap. You might like to take a look at perllocale.
    IBM mainframe has a different EBCDIC codeset for each European language, so it is not impossible to retain the correct characters, provided you know which charset they come from.