http://qs321.pair.com?node_id=1161620


in reply to Re^2: Comparing Lines within a Word List
in thread Comparing Lines within a Word List

Yes, you've correctly described the approach, which uses the shift function to extract the word that is currently at the beginning of the array, and then, if that first word contains "r" or "s", a regex is created and used with the grep function to search for matches in all the remaining words in the array.

One thing you didn't specify yet is what to do with sets like "cases / carer / caser / cares": Should the first one match all of the other three? Should the second one match both of the last two? Should the last two match each other? If the answer is "yes" on all points, then you'll want to create a different regex, which can be done using the split and map functions, and (my favorite from C) the "ternary" conditional operator:

my $model = shift @words; my $regex = join( "", map{ ( /[rs]/ ) ? "[rs]" : $_ } split( /([rs +])/, $model )); next if ( $regex eq $model ); # skip if model has no "r" or "s" my @hits = grep /^$regex$/, @words; ...
(BTW, maybe you already know, but /$regex/i (adding the "i" modifier at the end) does case-insensitive matches.)

(updated to add a missing paren at the end of the second line in the snippet -- also added the anchors around $regex in the grep call)

Replies are listed 'Best First'.
Re^4: Comparing Lines within a Word List
by dominick_t (Acolyte) on Apr 27, 2016 at 14:58 UTC
    Thank you. This is extremely helpful. To answer your question about the tricky case you mention: 'cases' should NOT match 'carer', as for my purpose these words differ in two positions. (even though both positions involve an R/S swap, I still the need matches to differ in exactly one position). 'cases' SHOULD match 'caser' and 'cares'. For the same reason, 'carer' SHOULD match both of the remaining two ('caser' and 'cares') And finally, 'caser' should NOT match 'cares', again because these words differ at more than one position. Does that clarify things?