http://qs321.pair.com?node_id=636818

Simple Substitution Ciphers

Following is a simple substitution cipher found in the September-October, 2006 issue of the Cryptogram:

UTAK QYK UTYU SGGE RWESIAPU LGIAQ VDGI AJBADOAPLA, YPE AJBADOAPLA, HAX +X UTYU LGIAQ VDGI ZYE RWESIAPU.

As mentioned in Perl uses for Cryptograms - Part 0: Introduction (low Perl content), the American Cryptogram Association calls this an "Aristocrat", and this is cryptogram A-1 from that issue. How are we to solve this cipher?

My first approach is to look for patterns in the words that suggest plaintext. "HAXX" suggests "tell" to me. "UTAK" and "UTYU" suggest "then" and "that". I usually paste the cryptogram into BION's (an ACA nom) Solve a Cipher page and start trying out the possibilities one letter at a time. If you'd like a Perl solution, fellow monk graff's Perl/Tk GUI serves a similar purpose.

Perl is better at this than I am

What really helps, though is Perl's regular expressions. If I'm struck by temporary mild aphasia and am at a loss for what word might work for a pattern, I can use Perl to find one for me. The ACA has on it's site a set of word lists. I've downloaded and used regularly Donald Knuth's list.

The word list is smiply a file with words separated by line-feeds. If I'm looking for other words that could match "HAXX", I would construct a simple on-the-fly regex to filter my word list by and type it into a Perl one-liner like this (on Windows):

perl -ne "print if m/^..(.)\1$/;" Words.knu

The parenthesis tell the regexp to remember this letter, and the '\1' tells it to put that remembered letter back in; thus capturing the double 'X' in "HAXX". This produces a disappointingly large list of 185 words. Dissappointing results like this and the fact that I've done this so often have led me to stick my one-liner in a batch file:

@ECHO OFF perl -ne "print if m/^%1$/;" Words.knu

Some of the words returned above don't actually match what I'm looking for (of course they match what I told Perl to look for). For instance,

Barr
bibb
Dodd
I'll
lull

Punctuation would be preserved in your basic Aristocrat, so "I'll" is out. The ACA marks proper nouns with asterisks ('*'). "HAXX" is not marked with an asterisk above, so "Barr" and "Dodd" are out. Also, 'H' and 'X' cannot stand for the same letter in a simple substitution cipher, so "bibb and "lull" are out

Better Patterns

What's needed is better regexp patterns. Getting rid of the proper nouns and punctuation is a simple matter of replacing '.', which means "any character" with a character class. I filter out uppercase letters and punctuation by using the character class '[a-z]' like this (still using a one liner here, but I also use my batch file):

perl -ne "print if m/^[a-z][a-z]([a-z])\1$/;" Words.knu

This filters out most of the problem words, but what about "bibb", "lull" and the like? The single most commonly used (by me) regexp trick is the "Zero-width negative lookahead assertion". Using this, I can tell the rexep pattern what a character is NOT. Here's the example of "HAXX":

perl -ne "print if m/^([a-z])(?!\1)([a-z])(?!\1)(?!\2)([a-z])\1$/;" Wo +rds.knu

The first thing I've done is put parenthesis around every character so I can refer back to them. Also, in between all the '([a-z])'s, there are the parts that have question marks in them. Those are the negative lookahead assertions.

'([a-z])(?!\1)'
says "give me a character between 'a' and 'z' that is not followed by the thing in the first set of parentheses". So if the first character is 'b' as in "bibb", then the second character can't be 'b'.
'([a-z])(?!\1)(?!\2)'
says "give me a character between 'a' and 'z' that is not followed by the thing in the first set of parentheses or the thing in the second set of parentheses", so the third letter can't be the second letter OR the first letter. If I've found a word starting with "bi", then the third letter can't be 'b' and also can't be 'i'.

Better word selection

This still only whittles the list down to 97 entries. At this point I usually try to find better CipherText (CT) words to search on. At first I thought that non-pattern words (words with no repeating characters) would be the solution for fewer results. Here are some various length non-pattern words and the number of hits I get from words.knu, along with the corresponding regexp*:
Nbr Chars Nbr words Regexp*
2 73
([a-z]) (?!\1)([a-z])
3 661
([a-z]) (?!\1)([a-z]) (?!\1)(?!\2)([a-z])
4 2194
([a-z]) (?!\1)([a-z]) (?!\1)(?!\2)([a-z]) (?!\1)(?!\2)(?!\3)([a-z])
5 3729
([a-z]) (?!\1)([a-z]) (?!\1)(?!\2)([a-z]) (?!\1)(?!\2)(?!\3)([a-z]) (?!\1)(?!\2)(?!\3)(?!\4)([a-z])
6 4353
([a-z]) (?!\1)([a-z]) (?!\1)(?!\2)([a-z]) (?!\1)(?!\2)(?!\3)([a-z]) (?!\1)(?!\2)(?!\3)(?!\4)([a-z]) (?!\1)(?!\2)(?!\3)(?!\4)(?!\5)([a-z])
7 4209
([a-z]) (?!\1)([a-z]) (?!\1)(?!\2)([a-z]) (?!\1)(?!\2)(?!\3)([a-z]) (?!\1)(?!\2)(?!\3)(?!\4)([a-z]) (?!\1)(?!\2)(?!\3)(?!\4)(?!\5)([a-z]) (?!\1)(?!\2)(?!\3)(?!\4)(?!\5)(?!\6)([a-z])
8 2932
([a-z]) (?!\1)([a-z]) (?!\1)(?!\2)([a-z]) (?!\1)(?!\2)(?!\3)([a-z]) (?!\1)(?!\2)(?!\3)(?!\4)([a-z]) (?!\1)(?!\2)(?!\3)(?!\4)(?!\5)([a-z]) (?!\1)(?!\2)(?!\3)(?!\4)(?!\5)(?!\6)([a-z]) (?!\1)(?!\2)(?!\3)(?!\4)(?!\5)(?!\6)(?!\7)([a-z])
9 1616
([a-z]) (?!\1)([a-z]) (?!\1)(?!\2)([a-z]) (?!\1)(?!\2)(?!\3)([a-z]) (?!\1)(?!\2)(?!\3)(?!\4)([a-z]) (?!\1)(?!\2)(?!\3)(?!\4)(?!\5)([a-z]) (?!\1)(?!\2)(?!\3)(?!\4)(?!\5)(?!\6)([a-z]) (?!\1)(?!\2)(?!\3)(?!\4)(?!\5)(?!\6)(?!\7)([a-z]) (?!\1)(?!\2)(?!\3)(?!\4)(?!\5)(?!\6)(?!\7)(?!\8)([a-z])
* I've broken up the regexp for readability and to emphasize the progression for successively longer non-pattern words. They should be in one line if used in the, um, one-liners or batch file above. If used in a large script, they could remain multi-line for readability with the /x suffix.

As you can see, the number of results isn't quite the restricted list you might hope for. I've learned that words with patterns return much shorter lists. The most promising seems to me to be "AJBADOAPLA". Reverting to the simplest pattern,

perl -ne "print if m/^(.)..\1..\1..\1$/;" Words.knu

results in:
effervesce
excellence
expedience
experience

This list is short enough that I can just use my eyeballs see that "effervesce" and "excellence" would be filtered out if I used zero-width negative lookahead assertions.

Working it out

Sticking "expe-ience" into either BION's or graff's tool yields (along with some more intelligent guesses):

Going back to BION's or graff's tools,

ultimately producing:

I hope you have fun with this. If you want more Aristocrats to try, I'd first recommend BION's Solve a Cipher page (select New Puzzle, then click Go!), then look into the ACA. I'd love to hear what kind of word patterns and one-liners you come up with to help solve simple ciphers.

update: 04-Sep-2007 slight readability changes
update: 06-Sep-2007 broke up the regexp in the table for readability (thx, bobf)


I humbly seek wisdom.