Just another Perl shrine | |
PerlMonks |
Re: unique sequencesby Cristoforo (Curate) |
on Dec 12, 2017 at 21:35 UTC ( [id://1205371]=note: print w/replies, xml ) | Need Help?? |
Hello,
I used BioPerl, Bio::SeqIO to load the fasta sequences all into a long string. Then I tested for kmers that only appeared once and printed them out. The results agree with your desired output.
This program runs in under 2 minutes with the string of fasta characters about 145M. From your specs it looks like you want to combine the sequences into one string to test for kmers that only occur 1 time. Update: Just realized how odd it was for my results to agree with the ones you posted. The order of my results were in the (un)ordered keys from the hash. It is curious why the order I got agreed with the order in your post! Also, I think the reason you were getting 1 million results rather than 250,000 is you are testing for uniqueness on the whole 21 char windowsize instead of testing just the last 12 chars (including the 'GG' ending). There would be more unique 21 char kmers than unique 12 char kmers.
In Section
Seekers of Perl Wisdom
|
|