Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Re^3: Reduce RAM required

by tybalt89 (Monsignor)
on Jan 09, 2019 at 17:50 UTC ( [id://1228267]=note: print w/replies, xml ) Need Help??


in reply to Re^2: Reduce RAM required
in thread Reduce RAM required

Changed to use '>'.

It counts A/T/G/C for each "window", not each sequence. That seemed to be what your program was doing. Its random generation over a "window" is equivalent to the shuffle over a "window" you were doing.

Uses file handles.

#!/usr/bin/perl # https://perlmonks.org/?node_id=1228191 use strict; use warnings; my $window = 1e6; my $A = my $C = my $G = my $all = 0; my (@sizes, $tmp, $start); my $inputfile = 'd.1228191'; my $outputfile = 'd.out.1228191'; open my $in, '<', $inputfile or die "$! opening $inputfile"; open my $out, '>', $outputfile or die "$! opening $outputfile"; sub letter { my $n = int rand $all--; $n < $A ? ($A--, return 'a') : $n < $A + $C ? ($C--, return 'c') : $n < $A + $C + $G ? ($G--, return 'g') : return 't'; } sub output { for my $count ( @sizes ) { print $out ">ID", $start++, "\n", map(letter(), 1 .. $count), "\n" +; } @sizes = (); } while( <$in> ) { if( /^>/ ) { $start //= s/\D+//gr; } elsif( /^[acgt]/ ) { $A += tr/a//; $C += tr/c//; $G += tr/g//; $all += $tmp = tr/acgt//; push @sizes, $tmp; $all >= $window and output(); } } $all and output(); close $in; close $out;

Replies are listed 'Best First'.
Re^4: Reduce RAM required
by onlyIDleft (Scribe) on Jan 09, 2019 at 19:05 UTC

    In your modified script, I hard coded the input and output file names as follows:

    my $inputfile = 'Ath_orig.fa'; my $outputfile = 'Ath_tybalt_shuffle.fa';

    the specified output file was created, BUT it was empty.

    So I tried to modify your script very slightly to file handle syntax I am most familiar with, as follows, but here too the output file specified was empty. Not sure what I am doing wrong....

    #!/usr/bin/perl #tybalt89_DNAfreq_Random_Generator.pl # https://perlmonks.org/?node_id=1228191 use strict; use warnings; my $window = 1e6; my $A = my $C = my $G = my $all = 0; my (@sizes, $tmp, $start); my $in = shift @ARGV; my $out = shift @ARGV; print $in, "\n"; print $out, "\n"; open IN, '<', $in or die "$! opening $in"; open OUT, '>', $out or die "$! opening $out"; sub letter { my $n = int rand $all--; $n < $A ? ($A--, return 'a') : $n < $A + $C ? ($C--, return 'c') : $n < $A + $C + $G ? ($G--, return 'g') : return 't'; } sub output { for my $count ( @sizes ) { print ">ID", $start++, "\n", map(letter(), 1 .. $count), "\n"; print OUT ">ID", $start++, "\n", map(letter(), 1 .. $count), "\n"; } @sizes = (); } while( <IN> ) { if( /^>/ ) { $start //= s/\D+//gr; } elsif( /^[acgt]/ ) { $A += tr/a//; $C += tr/c//; $G += tr/g//; $all += $tmp = tr/acgt//; push @sizes, $tmp; $all >= $window and output(); } } $all and output(); close IN; close OUT;

    Execution syntax was simply:

     perl tybalt89_DNAfreq_Random_Generator.pl Ath_orig.fa Ath_tybalt_shuffle.fa

    Thanks a lot!

      Does your input file have leading whitespace?

      Also, program state is altered by the print statement, you can't have two of them.

        head -n1 Ath_orig.fa >Chr1

        Doesn't appear to have any leading whitespace

        About the print statement, even if I comment out the first one, the output file is still empty

        Conversely, if I comment out the second one, I do not see the output on screen as STDOUT

        If you would like to replicate the same behavior with my input file, you can download it from here -

        https://www.filehosting.org/file/details/774814/Ath_orig.fa

        Thanks a lot!

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1228267]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others avoiding work at the Monastery: (4)
As of 2024-03-29 05:29 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found