I have received some great tips from here and on other forums. Some have suggested I should also read the DNA backwards, which is what I'll certainly do. That's easy with perl:
#!/usr/bin/perl
#
# Print a file backwards
#
open( FILE, "test.txt" )
or die( "Can't open file file_to_reverse: $!" );
@lines = reverse <FILE>;
foreach $line (@lines) {
$line = reverse $line;
print $line;
}
The claim that you can find any text in DNA, the Communist Manifesto or the entire works of Shakespeare is simply not true. When looking for a pattern, you simply can't find any text you want. You can prove it by writing a script that finds the entire works of Shakespeare in DNA. That should be easy to do in perl. You are not allowed to use a one time encryption pad, that is simply cheating. No algorithm can find the entire works of Shakespeare in DNA, no matter what encoding is used.
Some have suggested I should convert the DNA to bits and look there. That is a great idea, and I already wrote a script that converts DNA to bits. It reads A and T as a one and C and G as a zero and then adds up 7 bits at a time and outputs ascii characters. The challenge is to then find information in those characters which I have not yet done. Here's my script so far, I'm now using strict and warnings, learning all the time.
#!/usr/bin/perl
#
# Convert DNA to bits and see what comes out
#
use strict;
use warnings;
open my $fh, "<:encoding(UTF-8)", "Homo_sapiens.GRCh38.dna.chromosome.
+2.fa" or die "$!\n";
# open my $fh, "<:encoding(UTF-8)", "test.txt" or die "$!\n";
my $bit = "0";
my $byte = "";
my $i = 0;
while (read($fh, my $char, 1)) {
# ignore all other characters, escpecially those annoying NNNNNNNN,
+ what are they anyway?
if ($char =~ m/[ACGT]/ ) {
# convert bases to bits, 6 possible way to do this
if ($char eq "A") {
$bit = "0";
}
if ($char eq "C") {
$bit = "1";
}
if ($char eq "G") {
$bit = "0";
}
if ($char eq "T") {
$bit = "1";
}
# add one bit at a time up to a byte
if ($i < 7) {
$i++;
$byte .= $bit;
} else {
$i = 0;
# convert the byte to a string
my $chars = length($byte);
my @packArray = pack("B$chars",$byte);
my $print = "@packArray";
# only print alphanumeric characters
if ($print =~/[a-z]|[A-Z]/) {
print "$print";
}
$byte = "";
}
}
}
print "\n";
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.