Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

Re^10: Reduce RAM required

by onlyIDleft (Scribe)
on Jan 09, 2019 at 23:02 UTC ( [id://1228287]=note: print w/replies, xml ) Need Help??


in reply to Re^9: Reduce RAM required
in thread Reduce RAM required

Oh no, sorry! That was a "duh" moment from me! :)

How should the script be modified to use the original IDs?

I did add a few lines to your script, to reflect the possibility of Ns in the input, and those lines seem to work. I also just discovered that there appear to be some OTHER characters apart from A/C/G/T/N. The OTHERS could be one of R/Y/S/W/K/M/B/D/H/V. What is your advice for how to account for these? I hard coded those changes in a modified version of your script, as shown below, does it look alright?

Importantly, what is the part of your script that generates randomness in the sequence? Is it the map(letter(), 1 .. $count)?

I ask because of 2 reasons:

1. The goal is to randomly shuffle DNA, which is the obvious reason.

2. So that I can be sure that any two outputs with the same input are going to be different, due to this randomness

Many thanks!

#!/usr/bin/perl #tybalt89_DNAfreq_Random_Generator.pl # https://perlmonks.org/?node_id=1228191 use strict; use warnings; my $window = 1e6; my $A = my $C = my $G = my $T = my $N = my $all = 0; my $R=0; my $Y=0; my $S=0; my $W=0; my $K=0; my $M=0; my $B=0; my $D=0 +; my $H=0; my $V=0; my (@sizes, $tmp, $start); my $in = shift @ARGV; my $out = shift @ARGV; # print $in, "\n"; # print $out, "\n"; open IN, '<', $in or die "$! opening $in"; open OUT, '>', $out or die "$! opening $out"; sub letter { my $n = int rand $all--; $n < $A ? ($A--, return 'A') : $n < $A + $C ? ($C--, return 'C') : $n < $A + $C + $G ? ($G--, return 'G') : $n < $A + $C + $G + $T ? ($T--, return 'T') : $n < $A + $C + $G + $T + $R ? ($R--, return 'R') : $n < $A + $C + $G + $T + $R + $Y ? ($Y--, return 'Y') : $n < $A + $C + $G + $T + $R + $Y + $S ? ($S--, return 'S') : $n < $A + $C + $G + $T + $R + $Y + $S + $W ? ($W--, return 'W') : $n < $A + $C + $G + $T + $R + $Y + $S + $W + $K ? ($K--, return 'K +') : $n < $A + $C + $G + $T + $R + $Y + $S + $W + $K + $M ? ($M--, retu +rn 'M') : $n < $A + $C + $G + $T + $R + $Y + $S + $W + $K + $M + $B ? ($B--, + return 'B') : $n < $A + $C + $G + $T + $R + $Y + $S + $W + $K + $M + $B + $D ? ( +$D--, return 'D') : $n < $A + $C + $G + $T + $R + $Y + $S + $W + $K + $M + $B + $D + $ +H ? ($H--, return 'H') : $n < $A + $C + $G + $T + $R + $Y + $S + $W + $K + $M + $B + $D + $ +H + $V ? ($V--, return 'V') : return 'N'; } sub output { for my $count ( @sizes ) { # print ">ID", $start++, "\n", map(letter(), 1 .. $count), "\n"; print OUT ">ID", $start++, "\n", map(letter(), 1 .. $count), "\n"; } @sizes = (); } while( <IN> ) { if( /^>/ ) { $start //= s/\D+//gr; } elsif( /^[ACGTRYSWKMBDHVN]/ ) { $A += tr/A//; $C += tr/C//; $G += tr/G//; $T += tr/T//; $R += tr/R//; $Y += tr/Y//; $S += tr/S//; $W += tr/W//; $K += tr/K//; $M += tr/M//; $B += tr/B//; $D += tr/D//; $H += tr/H//; $V += tr/V//; $N += tr/N//; $all += $tmp = tr/ACGTRYSWKMBDHVN//; push @sizes, $tmp; $all >= $window and output(); } } $all and output(); close IN; close OUT;

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1228287]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others imbibing at the Monastery: (6)
As of 2024-04-20 02:34 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found