Re: Comparing Lines within a Word List

Regular expression are not the best tool to do what you want actually. Not because your problem is impossible or even difficult to solve with regular expressions, but because there is a much better option. The bitwise xor operator "^" will yield a 0 anywhere the two strings are equal, but 1 for every bit that is different between the two.

my $first = "Fool";
my $second = "Foot";
my $diff = ($first ^ $second);
print unpack "B*", $diff; # Print the binary representation of the dif
+ference
my @diff_char = split //, $diff; # get a char by char difference.
[download]

With that, and maybe the use of ord (you don't actually need it but it may help make things clearer) you should be able to do what you want.

Comment on Re: Comparing Lines within a Word List Download Code

Replies are listed 'Best First'.
Re^2: Comparing Lines within a Word List by AnomalousMonk (Archbishop) on Apr 26, 2016 at 23:02 UTC
Actually, bitwise-xor on strings and `tr///` (update: see Quote-Like Operators in perlop) go together quite nicely for something like this: `c:\@Work\Perl\monks>perl -wMstrict -le "use Data::Dump qw(pp); ;; for my $word (qw(Fool Foot Tool Toot Foal)) { my $diff = 'Fool' ^ $word; print qq{'$word': }, pp $diff; print qq{'Fool' and '$word' differ by 1 char} if 1 == $diff =~ tr/\x00//c; } " 'Fool': "\0\0\0\0" 'Foot': "\0\0\0\30" 'Fool' and 'Foot' differ by 1 char 'Tool': "\22\0\0\0" 'Fool' and 'Tool' differ by 1 char 'Toot': "\22\0\0\30" 'Foal': "\0\0\16\0" 'Fool' and 'Foal' differ by 1 char` [download] Update: Changed example code to use `tr/\x00//c` (`/c` modifier: complement the search list). Give a man a fish: `<%-{-{-{-<`	[reply] [d/l] [select]
Re^3: Comparing Lines within a Word List by dominick_t (Acolyte) on Apr 27, 2016 at 03:42 UTC
Thank you both for the replies! I hope everyone in the thread can see this, and not just the author of the note on which I hit the reply button. Okay, so if I'm getting this right, it looks like in this example, you're taking the word 'fool' and comparing its characters to each of the five words in the array, and since 'fool' matches itself exactly, the return on that one is all zeros. Any place there is not a zero is a place where the words differ. (I'm not immediately sure why the "difference" between the character 'l' and 't' would be 30 but I'm sure it's easily explained.) So I see how this works in principle, to compare two given words and look for word pairings that yield a one-character difference. But then how might I use this to solve the problem that I have, which is to find -- from let's say a massive dictionary of English language words -- all pairs of words that are the same except for one letter, and in particular, for that character difference to be that one has an R while the other has an S? Again, many thanks.	[reply]
Re^4: Comparing Lines within a Word List by AnomalousMonk (Archbishop) on Apr 27, 2016 at 16:22 UTC
I hope everyone in the thread can see this, and not just the author of the note on which I hit the reply button. They can. I'm not immediately sure why the "difference" between the character 'l' and 't' would be 30 ... You're seeing the octal values resulting from the character-by-character bitwise-xor of two strings. So `c:\@Work\Perl\monks>perl -wMstrict -le "printf qq{%#02o \n}, ord 'l'; printf qq{%#02o \n}, ord 't'; printf qq{%#02o \n}, 0154 ^ 0164; " 0154 0164 030` [download] ... the problem ... [find] from ... a massive dictionary of English language words -- all pairs of words that are the same except for one letter, and in particular, for that character difference to be that one has an R while the other has an S [in the same character position] ... [please note the emphasized addition] As to this much larger problem (as restated; please confirm this clarification — or may the differing characters be in any position? (Update: E.g., Is `'aSaa'` a "match" for `'aaRa'`?)): it's an interesting one, but I've no time right now to go into it in detail. Update: Actually, the `'02'` in the `'%#02o'` format specifier used in the printfs above is unnecessary, although it does no harm. The same result (and the result I wanted) can be had with `'%#o'` instead. Give a man a fish: `<%-{-{-{-<`	[reply] [d/l] [select]
Re^5: Comparing Lines within a Word List by dominick_t (Acolyte) on Apr 27, 2016 at 16:39 UTC
Re^3: Comparing Lines within a Word List by Eily (Monsignor) on Apr 27, 2016 at 18:51 UTC
I always forget about using tr/// for counting, thanks for the reminder :)	[reply]


Perl: the Markov chain saw
	PerlMonks