Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Re^5: Comparing Lines within a Word List

by hippo (Bishop)
on Apr 30, 2016 at 14:26 UTC ( [id://1161952]=note: print w/replies, xml ) Need Help??


in reply to Re^4: Comparing Lines within a Word List
in thread Comparing Lines within a Word List

If you cannot work out in your head what the substitution actually does (and it's not an easy thing if you are new to all this) then give it a try in some code. The lack of boilerplate in perl really helps when coding up trivial scripts for testing. eg:

#!/usr/bin/env perl use strict; use warnings; for my $word ('lama', 'aaron') { print "Word is $word\n"; print "without /g the regex becomes: "; my $r = $word; $r =~ s/(.*?)[ab](.*?)/$1\[ab\]$2/; print "$r\n"; print "with /g the regex becomes: "; $r = $word; $r =~ s/(.*?)[ab](.*?)/$1\[ab\]$2/g; print "$r\n"; }

Hopefully running this code will illustrate to you how the substitutions differ because of the /g modifier.

Replies are listed 'Best First'.
Re^6: Comparing Lines within a Word List
by dominick_t (Acolyte) on Apr 30, 2016 at 14:49 UTC
    Right, I'm still too new to regular expressions to make head or tails of this, but with enough time I'll be able to parse it. At the moment I'm just hoping to get something that works so that I can meet my deadline; after that I'll be able to focus on the actual learning.

    I'm noticing now that this new code is calling pairs matches that shouldn't be, such as 'baa' and 'bbb'. These shouldn't match because there are two positions where the words differ. I think I have enough correct matches to finish the puzzle I'm working on, so I wouldn't call it urgent, but soon it will be an important thing for me to look at and figure out.

      All of the approaches in this subthread are based upon graff's original statement about his interpretation of your problem:

      OTOH, if you're looking for words that contain a particular pair of characters, and differ only in terms of using one vs. the other of those two (e.g. you really just want "bare/base", etc., but not "foot/fool"), you would probably want to use a regex like this ...

      Having computed set of "matches" for each term you can choose to refine that such that they differ by only one character using the other approaches (bitwise OR, Levenshtein, etc) mentioned elsewhere in the thread.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1161952]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others studying the Monastery: (2)
As of 2024-04-19 22:40 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found