http://qs321.pair.com?node_id=1161605


in reply to Re^2: Comparing Lines within a Word List
in thread Comparing Lines within a Word List

Thank you both for the replies! I hope everyone in the thread can see this, and not just the author of the note on which I hit the reply button. Okay, so if I'm getting this right, it looks like in this example, you're taking the word 'fool' and comparing its characters to each of the five words in the array, and since 'fool' matches itself exactly, the return on that one is all zeros. Any place there is not a zero is a place where the words differ. (I'm not immediately sure why the "difference" between the character 'l' and 't' would be 30 but I'm sure it's easily explained.) So I see how this works in principle, to compare two given words and look for word pairings that yield a one-character difference. But then how might I use this to solve the problem that I have, which is to find -- from let's say a massive dictionary of English language words -- all pairs of words that are the same except for one letter, and in particular, for that character difference to be that one has an R while the other has an S? Again, many thanks.

Replies are listed 'Best First'.
Re^4: Comparing Lines within a Word List
by AnomalousMonk (Archbishop) on Apr 27, 2016 at 16:22 UTC
    I hope everyone in the thread can see this, and not just the author of the note on which I hit the reply button.

    They can.

    I'm not immediately sure why the "difference" between the character 'l' and 't' would be 30 ...

    You're seeing the octal values resulting from the character-by-character bitwise-xor of two strings. So

    c:\@Work\Perl\monks>perl -wMstrict -le "printf qq{%#02o \n}, ord 'l'; printf qq{%#02o \n}, ord 't'; printf qq{%#02o \n}, 0154 ^ 0164; " 0154 0164 030

    ... the problem ... [find] from ... a massive dictionary of English language words -- all pairs of words that are the same except for one letter, and in particular, for that character difference to be that one has an R while the other has an S [in the same character position] ...
    [please note the emphasized addition]

    As to this much larger problem (as restated; please confirm this clarification — or may the differing characters be in any position? (Update: E.g., Is 'aSaa' a "match" for 'aaRa'?)): it's an interesting one, but I've no time right now to go into it in detail.

    Update: Actually, the  '02' in the  '%#02o' format specifier used in the printfs above is unnecessary, although it does no harm. The same result (and the result I wanted) can be had with  '%#o' instead.


    Give a man a fish:  <%-{-{-{-<

      To clarify: 'aSaa' is NOT a match for 'aaRa'.

      To be a match, two words must be identical position-by-position, except in one position, and in that particular position, one word has an R while the other has an S.

      Thanks also for the explanation regarding the octal value of the characters.