http://qs321.pair.com?node_id=287394

giulienk has asked for the wisdom of the Perl Monks concerning the following question:

I need to work out a bunch of data with an algorithm i didn't choose. The algorithm has nothing to do with the "real" soundex algorithm (see Text::Soundex), even though it was called like that where i read it. So from now on i will baptize this algorithm Weirdex; this is described to receive a string in input and then:
• Strip chars outside [a-zA-Z] range
• Take all consonants except doubles (that should be reduced to one)
• Take just the first vowel

Examples:
OriginalWeirdex
giulienkgilnk
larry walllarywl
etheroskedasticityethrskdstcty

My first coding of Weirdex is

```sub weirdex {
local \$_ = shift;
my (\$weirdex, \$vowel) = ('', 0);
s/[^a-zA-Z]//g;
for (split '') {
if (/[aeiou]/i) {
\$vowel++ or \$weirdex .= \$_;
} else {
\$weirdex .= \$_ if substr(\$weirdex, -1, 1) ne \$_;
}
}
return \$weirdex;
}

I'm interested in more elegant/efficient solutions, only regexp solutions and even Golf/Obfu ones.
Thank you.

\$|=\$_="1g2i1u1l2i4e2n0k",map{print"\7",chop;select\$,,\$,,\$,,\$_/7}m{..}g

Replies are listed 'Best First'.
Re: Weird "soundex" algorithm
by broquaint (Abbot) on Aug 28, 2003 at 15:17 UTC
```sub weirdex {
local \$_ = shift;

tr/a-zA-Z//dc;
tr/a-zA-Z//s;
m/[aeiou]/g and substr(\$_, pos) =~ s/[aieuo]//g;

\$_;
}

print "\$_ - ", weirdex(\$_), \$/
for "giulienk", "larry wall", "etheroskedasticity";

__output__

giulienk - gilnk
larry wall - larywl
etheroskedasticity - ethrskdstcty
tr//, m// and s///, pos and substr are your friends :)
HTH

_________
broquaint

Thanks to yours and jmcnamara's reply: i totally forgot about the power of tr/// modifiers, especially c and s, which i never used before. :)

\$|=\$_="1g2i1u1l2i4e2n0k",map{print"\7",chop;select\$,,\$,,\$,,\$_/7}m{..}g

Nice. I was unhappy with your vowel stripping though. You use a looping match for offsets into \$_ but really, you only have to find the *first* vowel and then no looping is required.

```#  m/[aeiou]/g and substr(\$_, pos) =~ s/[aieuo]//g;
/[aeiou]/ and substr( \$_, \$+[0] ) =~ tr/aeiou//d;
You use a looping match for offsets into \$_ but really ...
Er, what looping? The /g matches the first vowel then saves the position of the match for substr which the replace then operates on. I didn't want to use the \$+ variable because of the overhead it invokes.
HTH

_________
broquaint

Re: Weird "soundex" algorithm
by jmcnamara (Monsignor) on Aug 28, 2003 at 15:20 UTC

```    sub weirdex {
local \$_ = \$_[0];

tr/a-zA-Z//s;
tr/a-zA-Z//cd;
my \$i; s/([aeiou])/\$1if!\$i++/eg;

\$_;
}

--
John.

Re: Weird "soundex" algorithm
by rcaputo (Chaplain) on Aug 28, 2003 at 15:46 UTC
Re: Weird "soundex" algorithm
by Aristotle (Chancellor) on Aug 28, 2003 at 16:03 UTC
```use strict;
use Test::More 'no_plan';

sub weirdex {
local \$_ = shift;
tr/A-Za-z//cd;
tr/A-Za-z//s;
/[aeiou]/g;
substr(\$_, pos()) =~ s/[aeiou]+//g;
return \$_;
}

my %test = (
'giulienk'           => 'gilnk',
'larry wall'         => 'larywl',
'etheroskedasticity' => 'ethrskdstcty',
);

my (\$input, \$result);

is(weirdex(\$input), \$result, "\$input") while (\$input, \$result) = each
+%test;

__END__
ok 1 - etheroskedasticity
ok 2 - larry wall
ok 3 - giulienk
1..3

Makeshifts last the longest.

Re: Weird "soundex" algorithm
by Anonymous Monk on Aug 28, 2003 at 15:37 UTC
/[aeiou]/i should be  /[aeiouy]/i.

Y is a vowel, isn't it?

Sometimes, so it may not be necessary for this particular algorithm. It would also appear that two Welsh words in the dictionary use 'w' as a vowel.
HTH

_________
broquaint

Y and W are pseudo-vowels. They represent vowel sounds, but are not vowel letters. This is just one of the many contradictions between spoken and written language. I think W is a vowel in Croatian.
Re: Weird "soundex" algorithm
by giulienk (Curate) on Sep 01, 2003 at 06:56 UTC
I add a couple comments for posterity:
• All the solutions (except the original one) have the same problem: they don't care about upper-case vowels. All examples were in lower cases so nobody noticed it.
• The original implementation was wrong has it would consider consonant separated by a "non-first-vowel" as contigous ones and squash them if they are equal.
• The illustrious surname "Schwartz" with its consonant-to-vowel ratio of 7.0 render this algorithm pretty useless :)

\$|=\$_="1g2i1u1l2i4e2n0k",map{print"\7",chop;select\$,,\$,,\$,,\$_/7}m{..}g