Quick way to convert to ASCII

kettle has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Quick way to convert to ASCII by blokhead (Monsignor) on Jul 26, 2006 at 04:14 UTC
Text::Unidecode looks like it does exactly that. It's pure Perl, but since it's essentially a giant lookup table for all of Unicode, it's not small (748k). blokhead	[reply]
Re^2: Quick way to convert to ASCII by GrandFather (Saint) on Jul 26, 2006 at 04:47 UTC
It gets the ligature right and has a great motto :) : MOTTO The Text::Unidecode motto is: It's better than nothing! ...in both meanings: 1) seeing the output of unidecode(...) is better than just having all font-unavailable Unicode characters replaced with ``?'''s, or rendered as gibberish; and 2) it's the worst, i.e., there's nothing that Text::Unidecode's algorithm is better than. DWIM is Perl's answer to Gödel	[reply]
Re: Quick way to convert to ASCII by GrandFather (Saint) on Jul 26, 2006 at 03:10 UTC
At the end of the day there has to be a lookup. That can be fairly quick using the translation function: `use warnings; use strict; my $str = <<'STR'; Les naïfs ægithales hâtifs pondant à Noël où il gèle sont sûrs d'être +déçus et de voir leurs drôles d'œufs abîmés STR my %xlateL = ( a => 'âà', c => 'ç', e => 'èëéê', i => 'ïî', o => 'ô', u => 'ùû' #... ); my %xlateU; $xlateU{uc $_} = uc ($xlateL{$_}) for keys %xlateL; #Generate the uppe +r case versions eval "\$str =~ tr/$xlateL{$_}/$_/;" for keys %xlateL; eval "\$str =~ tr/$xlateU{$_}/$_/;" for keys %xlateU; print $str;` [download] Prints: `Les naifs ægithales hatifs pondant a Noel ou il gele sont surs d'etre +decus et de voir leurs droles d'œufs abimes` [download] Note that æ causes a little grief however. Using a regex rather than the translation and a seperate set of tables is probably the fix for that. This would make a good CPAN module when you've got it done. :) DWIM is Perl's answer to Gödel	[reply] [d/l] [select]
Re: Quick way to convert to ASCII by ikegami (Patriarch) on Jul 26, 2006 at 03:04 UTC
A quick CPAN search revealed Text::StripAccents (pure perl, iso-latin-1 only) and Text::Unaccent	[reply]
Re^2: Quick way to convert to ASCII by GrandFather (Saint) on Jul 26, 2006 at 03:18 UTC
I notice Text::StripAccents at least (I didn't find Text::Unaccent using ppm) suffers the æ problem. No great surprise that something written to handle accents doesn't handle ligatures, but somewhat disapointing. DWIM is Perl's answer to Gödel	[reply]
Re^3: Quick way to convert to ASCII by jdtoronto (Prior) on Jul 26, 2006 at 03:26 UTC
Text::Unaccent is an XS module and wont build cleanly according to ActiveState. jdtoronto	[reply]
Re^4: Quick way to convert to ASCII by xdg (Monsignor) on Jul 26, 2006 at 11:17 UTC
Building Text::Unaccent on Vanilla/Strawberry Perl by xdg (Monsignor) on Jul 26, 2006 at 18:35 UTC
Re^4: Quick way to convert to ASCII by syphilis (Archbishop) on Jul 26, 2006 at 11:30 UTC
Re: Quick way to convert to ASCII by Thelonius (Priest) on Jul 26, 2006 at 12:31 UTC
I happened across a table for this just yesterday (with Greek and Cyrillic transliterations, too), so here's some Perl from that table: Read more... (25 kB) `# in-place sub asciiize { $_[0] =~ s/([^\0-\x7f])/exists($asciiize{$1})?$asciiize{$1}:"?"/eg; return $_[0]; } # returns new sub giveascii { asciiize(my $x = shift); }` [download] Edited by planetscape - added readmore tags Read more... view votes (25 Bytes)	[reply] [d/l] [select]


P is for Practical
	PerlMonks