Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Re^2: Reduce RAM required

by onlyIDleft (Scribe)
on Jan 09, 2019 at 16:57 UTC ( [id://1228264]=note: print w/replies, xml ) Need Help??


in reply to Re: Reduce RAM required
in thread Reduce RAM required

Thank you @ tybalt89. This should work. As I mentioned in my original post, my scripting is very rusty after nearly a 4 year gap. I understand only bits and pieces of your code.

One guess is that your script requires IDs to end in numbers, which you rely on to process and split the input data? Yes, no , may be? I'd prefer the distinction between ID line and sequence line be based on whether it is preceded by ">" symbol or not, for ID and sequence, respectively, please. How should i modify the script for that?

It appears to me that the script s counting A/T/G/C for each sequence individually, correct? If not, please skip. But if yes, then how much more RAM-hungry would a modification be where the A/T/G/C count frequency across ALL sequences are first calculated BEFORE generating sequences that match those frequenciues?

Finally, how do I accept input through a FH and output to a new FH. I tried several mods to your script, but only a few worked out. Hence this request for your additional assistance. Thanks a ton!

Replies are listed 'Best First'.
Re^3: Reduce RAM required
by tybalt89 (Monsignor) on Jan 09, 2019 at 17:50 UTC

    Changed to use '>'.

    It counts A/T/G/C for each "window", not each sequence. That seemed to be what your program was doing. Its random generation over a "window" is equivalent to the shuffle over a "window" you were doing.

    Uses file handles.

    #!/usr/bin/perl # https://perlmonks.org/?node_id=1228191 use strict; use warnings; my $window = 1e6; my $A = my $C = my $G = my $all = 0; my (@sizes, $tmp, $start); my $inputfile = 'd.1228191'; my $outputfile = 'd.out.1228191'; open my $in, '<', $inputfile or die "$! opening $inputfile"; open my $out, '>', $outputfile or die "$! opening $outputfile"; sub letter { my $n = int rand $all--; $n < $A ? ($A--, return 'a') : $n < $A + $C ? ($C--, return 'c') : $n < $A + $C + $G ? ($G--, return 'g') : return 't'; } sub output { for my $count ( @sizes ) { print $out ">ID", $start++, "\n", map(letter(), 1 .. $count), "\n" +; } @sizes = (); } while( <$in> ) { if( /^>/ ) { $start //= s/\D+//gr; } elsif( /^[acgt]/ ) { $A += tr/a//; $C += tr/c//; $G += tr/g//; $all += $tmp = tr/acgt//; push @sizes, $tmp; $all >= $window and output(); } } $all and output(); close $in; close $out;

      In your modified script, I hard coded the input and output file names as follows:

      my $inputfile = 'Ath_orig.fa'; my $outputfile = 'Ath_tybalt_shuffle.fa';

      the specified output file was created, BUT it was empty.

      So I tried to modify your script very slightly to file handle syntax I am most familiar with, as follows, but here too the output file specified was empty. Not sure what I am doing wrong....

      #!/usr/bin/perl #tybalt89_DNAfreq_Random_Generator.pl # https://perlmonks.org/?node_id=1228191 use strict; use warnings; my $window = 1e6; my $A = my $C = my $G = my $all = 0; my (@sizes, $tmp, $start); my $in = shift @ARGV; my $out = shift @ARGV; print $in, "\n"; print $out, "\n"; open IN, '<', $in or die "$! opening $in"; open OUT, '>', $out or die "$! opening $out"; sub letter { my $n = int rand $all--; $n < $A ? ($A--, return 'a') : $n < $A + $C ? ($C--, return 'c') : $n < $A + $C + $G ? ($G--, return 'g') : return 't'; } sub output { for my $count ( @sizes ) { print ">ID", $start++, "\n", map(letter(), 1 .. $count), "\n"; print OUT ">ID", $start++, "\n", map(letter(), 1 .. $count), "\n"; } @sizes = (); } while( <IN> ) { if( /^>/ ) { $start //= s/\D+//gr; } elsif( /^[acgt]/ ) { $A += tr/a//; $C += tr/c//; $G += tr/g//; $all += $tmp = tr/acgt//; push @sizes, $tmp; $all >= $window and output(); } } $all and output(); close IN; close OUT;

      Execution syntax was simply:

       perl tybalt89_DNAfreq_Random_Generator.pl Ath_orig.fa Ath_tybalt_shuffle.fa

      Thanks a lot!

        Does your input file have leading whitespace?

        Also, program state is altered by the print statement, you can't have two of them.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1228264]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others pondering the Monastery: (4)
As of 2024-04-18 03:30 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found