Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris

Re^5: Comparing Lines within a Word List

by graff (Chancellor)
on Apr 30, 2016 at 15:27 UTC ( #1161955=note: print w/replies, xml ) Need Help??

in reply to Re^4: Comparing Lines within a Word List
in thread Comparing Lines within a Word List

Note that hippo's addition of the g modifier brings up the issue that I raised earlier about having more than one character difference between two words. Here's what happens when I add a couple more examples to hippo's verson:
#!/usr/bin/perl use strict; use warnings; my @words = <DATA>; chomp @words; while ( @words >= 2 ) { my $model = my $regex = shift @words; if ( $regex =~ s/(.*?)[ab](.*?)/$1\[ab\]$2/g ) { my @hits = grep /^$regex$/, @words; if ( @hits ) { print join( " ", $model, "matches", @hits, "using", $regex +, "\n" ); } } } __DATA__ lama lamb able bale
lama matches lamb using l[ab]m[ab] able matches bale using [ab][ab]le
The output shows how the g modifier affects the creation of the regex to be used for searching the array; without it, the first regex would be l[ab]ma (which would not match "lamb"), and the next would be l[ab]mb (which would not match "lama" if it were to show up later in the list).

But when using the g modifier, the search pattern for "able" and "bale" come out the same, and they match each other, because the regex [ab][ab]le allows up to two characters to differ.

To solve that, you could to compare the current "model" word against each of the matches from the array, using the tr/// operator as described in previous replies, to see how many characters are different in each paired set of words, and keep only those matches that differ by a single character.

(UPDATE: It's also worth noting that using g this way is effectively equivalent to using "split", "map" and "join" to build the multi-match regex, like I showed in this previous reply - which just goes to show that "there's more than one way to do it."

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1161955]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others cooling their heels in the Monastery: (4)
As of 2021-03-01 09:51 GMT
Find Nodes?
    Voting Booth?

    No recent polls found