Simple Substitution Ciphers
Following is a simple substitution cipher found in the September-October, 2006 issue of the Cryptogram:
UTAK QYK UTYU SGGE RWESIAPU LGIAQ VDGI AJBADOAPLA, YPE AJBADOAPLA, HAX +X UTYU LGIAQ VDGI ZYE RWESIAPU.
As mentioned in Perl uses for Cryptograms - Part 0: Introduction (low Perl content), the American Cryptogram Association calls this an "Aristocrat", and this is cryptogram A-1 from that issue. How are we to solve this cipher?
My first approach is to look for patterns in the words that suggest plaintext. "HAXX" suggests "tell" to me. "UTAK" and "UTYU" suggest "then" and "that". I usually paste the cryptogram into BION's (an ACA nom) Solve a Cipher page and start trying out the possibilities one letter at a time. If you'd like a Perl solution, fellow monk graff's Perl/Tk GUI serves a similar purpose.
Perl is better at this than I am
What really helps, though is Perl's regular expressions. If I'm struck by temporary mild aphasia and am at a loss for what word might work for a pattern, I can use Perl to find one for me. The ACA has on it's site a set of word lists. I've downloaded and used regularly Donald Knuth's list.
The word list is smiply a file with words separated by line-feeds. If I'm looking for other words that could match "HAXX", I would construct a simple on-the-fly regex to filter my word list by and type it into a Perl one-liner like this (on Windows):
perl -ne "print if m/^..(.)\1$/;" Words.knu
The parenthesis tell the regexp to remember this letter, and the '\1' tells it to put that remembered letter back in; thus capturing the double 'X' in "HAXX". This produces a disappointingly large list of 185 words. Dissappointing results like this and the fact that I've done this so often have led me to stick my one-liner in a batch file:
@ECHO OFF perl -ne "print if m/^%1$/;" Words.knu
Some of the words returned above don't actually match what I'm looking for (of course they match what I told Perl to look for). For instance,
Barr
bibb
Dodd
I'll
lull
Punctuation would be preserved in your basic Aristocrat, so "I'll" is out. The ACA marks proper nouns with asterisks ('*'). "HAXX" is not marked with an asterisk above, so "Barr" and "Dodd" are out. Also, 'H' and 'X' cannot stand for the same letter in a simple substitution cipher, so "bibb and "lull" are out
Better Patterns
What's needed is better regexp patterns. Getting rid of the proper nouns and punctuation is a simple matter of replacing '.', which means "any character" with a character class. I filter out uppercase letters and punctuation by using the character class '[a-z]' like this (still using a one liner here, but I also use my batch file):
perl -ne "print if m/^[a-z][a-z]([a-z])\1$/;" Words.knu
This filters out most of the problem words, but what about "bibb", "lull" and the like? The single most commonly used (by me) regexp trick is the "Zero-width negative lookahead assertion". Using this, I can tell the rexep pattern what a character is NOT. Here's the example of "HAXX":
perl -ne "print if m/^([a-z])(?!\1)([a-z])(?!\1)(?!\2)([a-z])\1$/;" Wo +rds.knu
The first thing I've done is put parenthesis around every character so I can refer back to them. Also, in between all the '([a-z])'s, there are the parts that have question marks in them. Those are the negative lookahead assertions.
- '([a-z])(?!\1)'
- says "give me a character between 'a' and 'z' that is not followed by the thing in the first set of parentheses". So if the first character is 'b' as in "bibb", then the second character can't be 'b'.
- '([a-z])(?!\1)(?!\2)'
- says "give me a character between 'a' and 'z' that is not followed by the thing in the first set of parentheses or the thing in the second set of parentheses", so the third letter can't be the second letter OR the first letter. If I've found a word starting with "bi", then the third letter can't be 'b' and also can't be 'i'.
Better word selection
This still only whittles the list down to 97 entries. At this point I usually try to find better CipherText (CT) words to search on. At first I thought that non-pattern words (words with no repeating characters) would be the solution for fewer results. Here are some various length non-pattern words and the number of hits I get from words.knu, along with the corresponding regexp*:
Nbr Chars | Nbr words | Regexp* |
---|---|---|
2 | 73 |
|
3 | 661 |
|
4 | 2194 |
|
5 | 3729 |
|
6 | 4353 |
|
7 | 4209 |
|
8 | 2932 |
|
9 | 1616 |
|
As you can see, the number of results isn't quite the restricted list you might hope for. I've learned that words with patterns return much shorter lists. The most promising seems to me to be "AJBADOAPLA". Reverting to the simplest pattern,
perl -ne "print if m/^(.)..\1..\1..\1$/;" Words.knu
results in:
effervesce
excellence
expedience
experience
This list is short enough that I can just use my eyeballs see that "effervesce" and "excellence" would be filtered out if I used zero-width negative lookahead assertions.
Working it out
Sticking "expe-ience" into either BION's or graff's tool yields (along with some more intelligent guesses):
Going back to BION's or graff's tools,
ultimately producing:
I hope you have fun with this. If you want more Aristocrats to try, I'd first recommend BION's Solve a Cipher page (select New Puzzle, then click Go!), then look into the ACA. I'd love to hear what kind of word patterns and one-liners you come up with to help solve simple ciphers.
update: 04-Sep-2007 slight readability changes
update: 06-Sep-2007 broke up the regexp in the table for readability (thx, bobf)
I humbly seek wisdom.