You can speed this up considerably by transliterating everything you can and then only substituting characters that need it.
my $string = 'ÀÁÂÃÄÅàáâãäåÇçÈÉÊËèéêëÌÍÎÏìíîïÒÓÔÕÖØòóôõöøÑñÙÚÛÜùúûüÝÿýÆ
+æÞþÐðß';
print deaccent($string);
sub deaccent{
my $phrase = shift;
return $phrase unless ($phrase =~ m/[\xC0-\xFF]/);
$phrase =~ tr/ÀÁÂÃÄÅàáâãäåÇçÈÉÊËèéêëÌÍÎÏìíîïÒÓÔÕÖØòóôõöøÑñÙÚÛÜùúûü
+Ýÿý/AAAAAAaaaaaaCcEEEEeeeeIIIIiiiiOOOOOOooooooNnUUUUuuuuYyy/;
my %trans = (
'Æ' => 'AE',
'æ' => 'ae',
'Þ' => 'TH',
'þ' => 'th',
'Ð' => 'TH',
'ð' => 'th',
'ß' => 'ss'
);
$phrase =~ s/([ÆæÞþÐðß])/$trans{$1}/g;
return $phrase;
}
Benchmarking puts it at about 6 times the speed. Moving the hash assignment outside the sub speeds both up about the same amount, they stay about 6:1 ratio.
use Benchmark qw( cmpthese );
my $string = 'ÀÁÂÃÄÅàáâãäåÇçÈÉÊËèéêëÌÍÎÏìíîïÒÓÔÕÖØòóôõöøÑñÙÚÛÜùúûüÝÿýÆ
+æÞþÐðß';
cmpthese( -5, {
deaccent => sub {
my $phrase = $string;
return $phrase unless ($phrase =~ m/[\xC0-\xFF]/);
$phrase =~ tr/ÀÁÂÃÄÅàáâãäåÇçÈÉÊËèéêëÌÍÎÏìíîïÒÓÔÕÖØòóôõöøÑñÙÚÛÜùúûü
+Ýÿý/AAAAAAaaaaaaCcEEEEeeeeIIIIiiiiOOOOOOooooooNnUUUUuuuuYyy/;
my %trans = (
'Æ' => 'AE',
'æ' => 'ae',
'Þ' => 'TH',
'þ' => 'th',
'Ð' => 'TH',
'ð' => 'th',
'ß' => 'ss'
);
$phrase =~ s/([ÆæÞþÐðß])/$trans{$1}/g;
return $phrase;
},
deaccent2 => sub{
my %acc = qw(
À A Á A Â A Ã A Ä A Å A Æ AE
Ç C
È E É E Ê E Ë E
Ì I Í I Î I Ï I
Ð TH Ñ N
Ò O Ó O Ô O Õ O Ö O Ø O
Ù U Ú U Û U Ü U
Ý U Þ TH ß ss
à a á a â a ã a ä a å a æ ae
ç c
è e é e ê e ë e
ì i í i î i ï i
ð th ñ n
ò o ó o ô o õ o ö o ø o
ù u ú u û u ü u
ý y þ th ÿ y
);
my $text = $string;
$text =~ s/(.)/$acc{$1}?$acc{$1}:$1/eg;
return $text;
},
});
Returns on my system:
Rate deaccent2 deaccent
deaccent2 4316/s -- -86%
deaccent 30859/s 615% --
With data that has fewer accented characters, the disparity should grow much greater since it will short circuit if there are no characters to be transliterated.