http://qs321.pair.com?node_id=1214442

I have for a time entertained the idea, that if God is the creator, he would have left his signature in the DNA of human species. If I was the creator, I would have encoded the entire Hebrew Bible in DNA, so to let no one doubt that DNA was created by God and that the Bible is the word of God.

I finally took up the challenge and wrote a perl script to check if the first five verses of Bible are encoded in DNA. Naturally there is an infinite number of ways to encode information in DNA, but I assumed that God would have used something quite obvious in order for us to be able to find information encoded in DNA. I知 assuming that if the Bible is encoded in DNA, the encoding used would be the same as for protein synthesis, namely that triplets of DNA base pairs would encode for one character. There are 64 possible codons so there is plenty of redundancy when they are used for encoding 22 hebrew alphabets (plus sofit forms for five characters).

Like so:

AAA -> Y AAC -> XXX AAG -> B AAT -> XXX ACA -> A ACC -> M ACG -> XXX ACT -> XXX AGA -> R AGC -> R AGG -> W AGT -> W ATA -> H ATC -> XXX ATG -> A ATT -> H CAA -> XXX CAC -> XXX CAG -> I CAT -> XXX CCA -> XXX CCC -> XXX CCG -> XXX CCT -> V CGA -> XXX CGC -> XXX CGG -> O CGT -> XXX CTA -> XXX CTC -> E CTG -> V CTT -> H GAA -> H GAC -> I GAG -> XXX GAT -> T GCA -> XXX GCC -> B GCG -> XXX GCT -> H GGA -> V GGC -> H GGG -> Y GGT -> A GTA -> Y GTC -> V GTG -> A GTT -> I TAA -> E TAC -> XXX TAG -> XXX TAT -> O TCA -> A TCC -> Y TCG -> XXX TCT -> XXX TGA -> R TGC -> H TGG -> A TGT -> XXX TTA -> XXX TTC -> L TTG -> B TTT -> A

My dirty little perl script reads a FASTA file one character at a time and when a triplet is read, it check to see if that codon is already defined. If it is not, the first character of the target sequence is added to a hash containing all the codons. The algorithm then moves to the next triplet in DNA and check to see if that triplet is defined and so on. When a triplet is already defined and the character stored does not equal the target sequence, the script records the maximum length of the sequence found and goes back to the beginning of DNA and moves forward one base pair to continue the search.

I知 not a computer science expert and I知 sure that my script is dirty and messy, but it does work. It takes 33h to search one target sequence against the 3 billion base pairs of human DNA. The FASTA files are in chunks of roughly 150 million base pairs, so several files need to be checked by hand, but this is not much of a problem. My computer crashes when I try to load more than 10million base pairs at a time, so the script reads each FASTA file in chunks of 5 million base pairs at a time.

I could not get hebrew characters to work properly, so I simply translitterated the first five chapters of Genesis to ASCII characters. This is a dirty way of going about it, but it works.

Aleph A Bet B Gimmel G Dalet D Hey H Vav V Zayin Z Chet C Tet T Yod Y Kaf K G Lamed L Mem M O Nun N J Samekh S Ayin X Pey P Tsadi U W Kuf Q Resh R Shin E Tav I Gen 1:1-5 BRAEYI BRA ALHYO AI HEMYO VAT HARW VHARW HYIH IHV VBHV VCEG XL PNY IHV +O VRVC ALHYO MRCPI AL PNY HMYO VYAMR ALHYO YHY AVR VYHY AVR VYRA ALHY +O AI HAVR KY TVB VYBDL ALHYO BYJ HAVR VBYJ HCEG VYQRA ALHYO LAVR YVO +VLCEG QRA LYLH VYHY XRB BYHY BQR YVO ACD

For control sequences I used Lorem ipsum, War and Peace and a random string. For the control sequences I checked the first one million base pairs only.

The results so far:

Lorem ipsum 42 characters found (250 million searched) War and peace 35 characters found (one million searched) Random string 35 characters found (one million searched)

Having checked the hebrew Bible against so far 500 million base pairs, the maximum sequence found was 45 characters. This is more than the control sequences, but only because much more base pairs were compared. To be sure that the sequence was encoded in DNA by God, I would expect to find a sequence of hundreds of characters, preferably all the first five verses of Genesis. I知 not a mathematician, so I have not calculated what the maximum sequence length would be if left to chance alone. But the control sequences do give some estimate.

I知 of course assuming that God used the hebrew Bible, because some say hebrew is the holy language, but I致e also checked the King James English for the first verses of Matthew and John. If God is omnipotent, surely he could have encoded the Bible in DNA in any language. In the future I値l check if New Testament passeges are encoded in Greek, but thus far I知 working with the assumption that the most awesome thing for God to do would have been to encode the biginning of Genesis. Will post results when I find anything.

Let me know what you think of my efforts, I know this is nuts.

My code can be downloaded at kristitty.net/blog/finding-bible-verses-in-dna/