http://qs321.pair.com?node_id=1161955


in reply to Re^4: Comparing Lines within a Word List
in thread Comparing Lines within a Word List

Note that hippo's addition of the g modifier brings up the issue that I raised earlier about having more than one character difference between two words. Here's what happens when I add a couple more examples to hippo's verson:
#!/usr/bin/perl use strict; use warnings; my @words = <DATA>; chomp @words; while ( @words >= 2 ) { my $model = my $regex = shift @words; if ( $regex =~ s/(.*?)[ab](.*?)/$1\[ab\]$2/g ) { my @hits = grep /^$regex$/, @words; if ( @hits ) { print join( " ", $model, "matches", @hits, "using", $regex +, "\n" ); } } } __DATA__ lama lamb able bale
Output:
lama matches lamb using l[ab]m[ab] able matches bale using [ab][ab]le
The output shows how the g modifier affects the creation of the regex to be used for searching the array; without it, the first regex would be l[ab]ma (which would not match "lamb"), and the next would be l[ab]mb (which would not match "lama" if it were to show up later in the list).

But when using the g modifier, the search pattern for "able" and "bale" come out the same, and they match each other, because the regex [ab][ab]le allows up to two characters to differ.

To solve that, you could to compare the current "model" word against each of the matches from the array, using the tr/// operator as described in previous replies, to see how many characters are different in each paired set of words, and keep only those matches that differ by a single character.

(UPDATE: It's also worth noting that using g this way is effectively equivalent to using "split", "map" and "join" to build the multi-match regex, like I showed in this previous reply - which just goes to show that "there's more than one way to do it."