Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Re^2: Comparing Lines within a Word List

by dominick_t (Acolyte)
on Apr 27, 2016 at 03:57 UTC ( [id://1161606]=note: print w/replies, xml ) Need Help??


in reply to Re: Comparing Lines within a Word List
in thread Comparing Lines within a Word List

Thanks very much for the reply! At the moment, the problem I'd first like to solve is the second one you described . . . at least if I'm reading you correctly. I want pairs of words that are exactly the same except where one has an R, the other has an S. But I'm very much a newbie and am having a tough time parsing your example code so I cannot tell how it is working . . . would it be possible to walk me through it a bit? In general, are we starting with the first word in the list and testing it against the words below it for the R/S difference and reporting it if one is found, then moving to the next item on the list and testing it against all the ones below it, and so on? Again, many thanks!
  • Comment on Re^2: Comparing Lines within a Word List

Replies are listed 'Best First'.
Re^3: Comparing Lines within a Word List
by graff (Chancellor) on Apr 27, 2016 at 10:07 UTC
    Yes, you've correctly described the approach, which uses the shift function to extract the word that is currently at the beginning of the array, and then, if that first word contains "r" or "s", a regex is created and used with the grep function to search for matches in all the remaining words in the array.

    One thing you didn't specify yet is what to do with sets like "cases / carer / caser / cares": Should the first one match all of the other three? Should the second one match both of the last two? Should the last two match each other? If the answer is "yes" on all points, then you'll want to create a different regex, which can be done using the split and map functions, and (my favorite from C) the "ternary" conditional operator:

    my $model = shift @words; my $regex = join( "", map{ ( /[rs]/ ) ? "[rs]" : $_ } split( /([rs +])/, $model )); next if ( $regex eq $model ); # skip if model has no "r" or "s" my @hits = grep /^$regex$/, @words; ...
    (BTW, maybe you already know, but /$regex/i (adding the "i" modifier at the end) does case-insensitive matches.)

    (updated to add a missing paren at the end of the second line in the snippet -- also added the anchors around $regex in the grep call)

      Thank you. This is extremely helpful. To answer your question about the tricky case you mention: 'cases' should NOT match 'carer', as for my purpose these words differ in two positions. (even though both positions involve an R/S swap, I still the need matches to differ in exactly one position). 'cases' SHOULD match 'caser' and 'cares'. For the same reason, 'carer' SHOULD match both of the remaining two ('caser' and 'cares') And finally, 'caser' should NOT match 'cares', again because these words differ at more than one position. Does that clarify things?

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1161606]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others goofing around in the Monastery: (None)
    As of 2024-04-25 01:45 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?

      No recent polls found