Re: Random shuffling

If your DNA sequences don't contain any separators, the code looks okay to me.

I would optimize the processing in order to avoid wasting time and memory.

You only need to hold two input lines in memory, since there are no dependencies to other lines.
One random shuffle should be enough to make it random.
I would check whether the input file contains an even number of lines, that is when retrieving a pair of lines, check first, if the DNA sequence line is present.
$final_seq should be initialized to the empty string in order to avoid a warning on an 'undefined' value.

So in summary I would write it like this (untested!):

use strict;
use warnings;
use List::Util 'shuffle'; # Idea from http://www.perlmonks.org/?node_i
+d=199901

my $input = shift @ARGV;
open(IN, '<', $input) or die "Can't read multifasta input genomic DNA 
+file $input : $!\n";

my $destination = $input."_1MBWindow_ListUtilshuffle.fasta";
open(OUT, '>', $destination) or die "Can't write to file $destination:
+ $!\n";

my $window = 1000000; # hard coded for shuffle window to be 1MB i.e 10
+^6

my ($seq_id, $seq);

# process every alternate line with ID (and its corresponding sequence
+ in next line)
while (defined($seq_id = <IN>) {
    if (!defined($seq = <IN>) {
        last;
    }
    chomp $seg_id;
    chomp $seg;
    my $final_seq = '';

    for (my $i = 1; $i <= length $seq; $i += $window ) {
        my $s = substr ($seq, $i - 1, $window);
        my @temp_seq_array = split //, $s;

        @temp_seq_array = shuffle @temp_seq_array; # using the List::U
+til module AND Shuffles EACH window!!!
        my $rand_shuffled_seq = join ('', @temp_seq_array,);
    $final_seq .= $rand_shuffled_seq; # concatenates the shuffled DNA 
+seq to the 3' end of the previous 1MB fragment
    }
    print OUT $seq_id, "\n",$final_seq,"\n";
}
close IN;
close OUT;
[download]

Comment on Re: Random shuffling Select or Download Code

Replies are listed 'Best First'.
Re^2: Random shuffling by onlyIDleft (Scribe) on Jun 21, 2015 at 00:13 UTC
Please see my UPDATE 1 to the original post, perhaps your answer may change or need modification in light of the additional info I have provided?	[reply]
Re^2: Random shuffling by onlyIDleft (Scribe) on Jun 20, 2015 at 18:27 UTC
Thank you hexcoder. I appreciate your inputs to write out a modified version of my script to make the runs quicker and more memory efficient. Cheers! PS. There are no separators in the DNA sequences....	[reply]

In Section Seekers of Perl Wisdom