Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options

Re^3: Comparing Lines within a Word List

by graff (Chancellor)
on Apr 27, 2016 at 10:07 UTC ( #1161620=note: print w/replies, xml ) Need Help??

in reply to Re^2: Comparing Lines within a Word List
in thread Comparing Lines within a Word List

Yes, you've correctly described the approach, which uses the shift function to extract the word that is currently at the beginning of the array, and then, if that first word contains "r" or "s", a regex is created and used with the grep function to search for matches in all the remaining words in the array.

One thing you didn't specify yet is what to do with sets like "cases / carer / caser / cares": Should the first one match all of the other three? Should the second one match both of the last two? Should the last two match each other? If the answer is "yes" on all points, then you'll want to create a different regex, which can be done using the split and map functions, and (my favorite from C) the "ternary" conditional operator:

my $model = shift @words; my $regex = join( "", map{ ( /[rs]/ ) ? "[rs]" : $_ } split( /([rs +])/, $model )); next if ( $regex eq $model ); # skip if model has no "r" or "s" my @hits = grep /^$regex$/, @words; ...
(BTW, maybe you already know, but /$regex/i (adding the "i" modifier at the end) does case-insensitive matches.)

(updated to add a missing paren at the end of the second line in the snippet -- also added the anchors around $regex in the grep call)

Replies are listed 'Best First'.
Re^4: Comparing Lines within a Word List
by dominick_t (Acolyte) on Apr 27, 2016 at 14:58 UTC
    Thank you. This is extremely helpful. To answer your question about the tricky case you mention: 'cases' should NOT match 'carer', as for my purpose these words differ in two positions. (even though both positions involve an R/S swap, I still the need matches to differ in exactly one position). 'cases' SHOULD match 'caser' and 'cares'. For the same reason, 'carer' SHOULD match both of the remaining two ('caser' and 'cares') And finally, 'caser' should NOT match 'cares', again because these words differ at more than one position. Does that clarify things?

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1161620]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others romping around the Monastery: (6)
As of 2021-03-01 17:03 GMT
Find Nodes?
    Voting Booth?
    My favorite kind of desktop background is:

    Results (15 votes). Check out past polls.