shamshersingh has asked for the wisdom of the Perl Monks concerning the following question:
I have a large set (100000+) of short DNA reads 20 characters long.
I need to compares all reads against each other and pull out those that vary by just 1 position. Heres the script I came up with.
The problem is that this loop runs very very slow. It takes on the orders of days to process 100000 sequences. Is there a way to make the process faster?$| = 1; my $compare_count = 0; for (my $i = 0; $i < @kmers; $i++ ) { for (my $j = $i + 1; $j < @kmers; $j++ ) { print "\rComparing sequence $i to $j"; my @result = PCCompare::dissimilarity($kmers[$i], $kmers[$j], +1); if ($result[0] == 1) { print "\rMatch found: $kmers[$i], $kmers[$j]\n"; push @variant_kmers, ($kmers[$i], $kmers[$j]); } $compare_count++; } } print "\rFinished: $compare_count comparisions made.\n";
|
---|
Replies are listed 'Best First'. | |
---|---|
Re: Comparing a large set of DNA sequences
by BrowserUk (Patriarch) on Nov 09, 2011 at 23:09 UTC | |
by roboticus (Chancellor) on Nov 10, 2011 at 00:24 UTC | |
by aaron_baugher (Curate) on Nov 10, 2011 at 16:17 UTC | |
by roboticus (Chancellor) on Nov 10, 2011 at 16:30 UTC | |
by BrowserUk (Patriarch) on Nov 10, 2011 at 16:26 UTC | |
by roboticus (Chancellor) on Nov 10, 2011 at 18:39 UTC | |
Re: Comparing a large set of DNA sequences
by Anonymous Monk on Nov 10, 2011 at 05:15 UTC |
Back to
Seekers of Perl Wisdom