Weird "soundex" algorithm

giulienk has asked for the wisdom of the Perl Monks concerning the following question:

I need to work out a bunch of data with an algorithm i didn't choose. The algorithm has nothing to do with the "real" soundex algorithm (see Text::Soundex), even though it was called like that where i read it. So from now on i will baptize this algorithm Weirdex; this is described to receive a string in input and then:

Strip chars outside [a-zA-Z] range
Take all consonants except doubles (that should be reduced to one)
Take just the first vowel

Examples:

Original Weirdex

giulienk gilnk

larry wall larywl

etheroskedasticity ethrskdstcty

Original	Weirdex
giulienk	gilnk
larry wall	larywl
etheroskedasticity	ethrskdstcty

My first coding of Weirdex is

sub weirdex {
    local $_ = shift;
    my ($weirdex, $vowel) = ('', 0);
    s/[^a-zA-Z]//g;
    for (split '') {
        if (/[aeiou]/i) {
            $vowel++ or $weirdex .= $_;
        } else {
            $weirdex .= $_ if substr($weirdex, -1, 1) ne $_;
        }
    }
    return $weirdex;
}
[download]

I'm interested in more elegant/efficient solutions, only regexp solutions and even Golf/Obfu ones.
Thank you.

$|=$_="1g2i1u1l2i4e2n0k",map{print"\7",chop;select$,,$,,$,,$_/7}m{..}g

Comment on Weird "soundex" algorithm Download Code

Replies are listed 'Best First'.
Re: Weird "soundex" algorithm by broquaint (Abbot) on Aug 28, 2003 at 15:17 UTC
`sub weirdex { local $_ = shift; tr/a-zA-Z//dc; tr/a-zA-Z//s; m/[aeiou]/g and substr($_, pos) =~ s/[aieuo]//g; $_; } print "$_ - ", weirdex($_), $/ for "giulienk", "larry wall", "etheroskedasticity"; __output__ giulienk - gilnk larry wall - larywl etheroskedasticity - ethrskdstcty` [download] `tr//`, `m//` and `s///`, `pos` and `substr` are your friends :) HTH `_________ broquaint`	[reply] [d/l]
Re: Re: Weird "soundex" algorithm by giulienk (Curate) on Aug 28, 2003 at 15:32 UTC
Thanks to yours and jmcnamara's reply: i totally forgot about the power of tr/// modifiers, especially `c` and `s`, which i never used before. :) `$\|=$_="1g2i1u1l2i4e2n0k",map{print"\7",chop;select$,,$,,$,,$_/7}m{..}g`	[reply]
Re: Re: Weird "soundex" algorithm by diotalevi (Canon) on Aug 28, 2003 at 19:10 UTC
Nice. I was unhappy with your vowel stripping though. You use a looping match for offsets into $_ but really, you only have to find the first vowel and then no looping is required. `# m/[aeiou]/g and substr($_, pos) =~ s/[aieuo]//g; /[aeiou]/ and substr( $_, $+[0] ) =~ tr/aeiou//d;` [download]	[reply] [d/l]
Re: Re: Re: Weird "soundex" algorithm by broquaint (Abbot) on Aug 28, 2003 at 23:58 UTC
You use a looping match for offsets into $_ but really ... Er, what looping? The `/g` matches the first vowel then saves the position of the match for `substr` which the replace then operates on. I didn't want to use the `$+` variable because of the overhead it invokes. HTH `_________ broquaint`	[reply]
Re: Re: Re: Re: Weird "soundex" algorithm by diotalevi (Canon) on Aug 29, 2003 at 01:53 UTC
Re: Weird "soundex" algorithm by jmcnamara (Monsignor) on Aug 28, 2003 at 15:20 UTC
`sub weirdex { local $_ = $_[0]; tr/a-zA-Z//s; tr/a-zA-Z//cd; my $i; s/([aeiou])/$1if!$i++/eg; $_; }` [download] -- John.	[reply] [d/l]
Re: Weird "soundex" algorithm by rcaputo (Chaplain) on Aug 28, 2003 at 15:46 UTC
FYI, the algorithm sounds a lot like (heh) metaphone: Text::Metaphone Text::DoubleMetaphone Text::TransMetaphone `-- Rocco Caputo - rcaputo@pobox.com - poe.perl.org`	[reply]
Re: Weird "soundex" algorithm by Aristotle (Chancellor) on Aug 28, 2003 at 16:03 UTC
`use strict; use Test::More 'no_plan'; sub weirdex { local $_ = shift; tr/A-Za-z//cd; tr/A-Za-z//s; /[aeiou]/g; substr($_, pos()) =~ s/[aeiou]+//g; return $_; } my %test = ( 'giulienk' => 'gilnk', 'larry wall' => 'larywl', 'etheroskedasticity' => 'ethrskdstcty', ); my ($input, $result); is(weirdex($input), $result, "$input") while ($input, $result) = each +%test; __END__ ok 1 - etheroskedasticity ok 2 - larry wall ok 3 - giulienk 1..3` [download] Makeshifts last the longest.	[reply] [d/l]
Re: Weird "soundex" algorithm by giulienk (Curate) on Sep 01, 2003 at 06:56 UTC
I add a couple comments for posterity: All the solutions (except the original one) have the same problem: they don't care about upper-case vowels. All examples were in lower cases so nobody noticed it. The original implementation was wrong has it would consider consonant separated by a "non-first-vowel" as contigous ones and squash them if they are equal. The illustrious surname "Schwartz" with its consonant-to-vowel ratio of 7.0 render this algorithm pretty useless :) `$\|=$_="1g2i1u1l2i4e2n0k",map{print"\7",chop;select$,,$,,$,,$_/7}m{..}g`	[reply]
Re: Weird "soundex" algorithm by Anonymous Monk on Aug 28, 2003 at 15:37 UTC
`/[aeiou]/i` should be `/[aeiouy]/i`. Y is a vowel, isn't it?	[reply] [d/l] [select]
Re: Re: Weird "soundex" algorithm by broquaint (Abbot) on Aug 28, 2003 at 15:51 UTC
Sometimes, so it may not be necessary for this particular algorithm. It would also appear that two Welsh words in the dictionary use 'w' as a vowel. HTH `_________ broquaint`	[reply]
Re: Re: Weird "soundex" algorithm by PhilHibbs (Hermit) on Aug 29, 2003 at 12:15 UTC
Y and W are pseudo-vowels. They represent vowel sounds, but are not vowel letters. This is just one of the many contradictions between spoken and written language. I think W is a vowel in Croatian.	[reply]

Back to Seekers of Perl Wisdom