If your DNA sequences don't contain any separators, the code looks okay to me.
I would optimize the processing in order to avoid wasting time and memory.
- You only need to hold two input lines in memory, since there are no dependencies to other lines.
- One random shuffle should be enough to make it random.
- I would check whether the input file contains an even number of lines, that is when retrieving a pair of lines, check first, if the DNA sequence line is present.
- $final_seq should be initialized to the empty string in order to avoid a warning on an 'undefined' value.
So in summary I would write it like this (untested!):
use strict;
use warnings;
use List::Util 'shuffle'; # Idea from http://www.perlmonks.org/?node_i
+d=199901
my $input = shift @ARGV;
open(IN, '<', $input) or die "Can't read multifasta input genomic DNA
+file $input : $!\n";
my $destination = $input."_1MBWindow_ListUtilshuffle.fasta";
open(OUT, '>', $destination) or die "Can't write to file $destination:
+ $!\n";
my $window = 1000000; # hard coded for shuffle window to be 1MB i.e 10
+^6
my ($seq_id, $seq);
# process every alternate line with ID (and its corresponding sequence
+ in next line)
while (defined($seq_id = <IN>) {
if (!defined($seq = <IN>) {
last;
}
chomp $seg_id;
chomp $seg;
my $final_seq = '';
for (my $i = 1; $i <= length $seq; $i += $window ) {
my $s = substr ($seq, $i - 1, $window);
my @temp_seq_array = split //, $s;
@temp_seq_array = shuffle @temp_seq_array; # using the List::U
+til module AND Shuffles EACH window!!!
my $rand_shuffled_seq = join ('', @temp_seq_array,);
$final_seq .= $rand_shuffled_seq; # concatenates the shuffled DNA
+seq to the 3' end of the previous 1MB fragment
}
print OUT $seq_id, "\n",$final_seq,"\n";
}
close IN;
close OUT;
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.