http://qs321.pair.com?node_id=1053102

azheid has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks, I have a computational problem to address, and the code I have written for it is very slow. Since I am a novice in programming (relative to you guys anyway) I wanted to get some opinions on how to improve the attached program

I have ran the NYT profiler on the code and I will hopefully be able to attach it sucessfully. The big issue is when I use the -p option to generate a bootstrapping (correct term?) distribution of the data. If there are major improvements you can think of in that section of code, please give your advice.

Also, if this is too hard to read, please let me know and I will just post the code by itself, nd please accept my apologies.

#!/usr/bin/perl use warnings; use strict; use Getopt::Long; use List::Util qw(shuffle); my $k=6; my $window=31; my $bins=2*$window+$k; my %ktc; my $sequence_fname; my $wiggle_fname; my $permute=0; my $normalize_fname; my $out='default_out'; GetOptions ( "o=s" => \$out, "s=s" => \$sequence_fname, "wig=s" => \$wiggle_fname, "n=s" => \$normalize_fname, "p=i" => \$permute); ############initialize %ktc $out=$out.'counts.tab'; my $letters=[ qw(A C G T N) ]; my @words=&combin($letters,$k ); foreach(@words){ @{$ktc{$_}}= ((0)x$bins) ; } ########## ##########Load normalization file data open(NORM,"<$normalize_fname")||die "Cannot open normalization file\n" +; my %normalize; while (<NORM>){ my @a=split(/\t/,$_); $normalize{$a[0]}=$a[1]; } close NORM; ##########Load position counts my %counts; open (CTS,'<',$wiggle_fname)||die "Cannot open wiggle file\n"; my $cur_chr=''; my $pos=''; my $counts=''; while (<CTS>){ if($_=~/variableStep chrom=(.*?)$/){ $cur_chr=$1; $counts{$cur_chr}={}; } else{ ($pos,$counts)=split(/\t/,$_); $counts{$cur_chr}{$pos}=$counts; } } close CTS; ####### #########Map counts to orfs my $test=0; if(!($permute)){ open(SEQ,'<',$sequence_fname)||die "Cannot open sequence file\n"; while (my @line=split (/\t/,<SEQ>)){ my $ct=0; my $start=$line[1]; if ($start < $line[2]){##Watson Strand until ($start > $line[2]-$window){ for(my $i=0;$i<$bins;++$i){ if(defined $normalize{$line[3]} && $normalize{$lin +e[3]} > 0 && $counts{$line[0]}{$start-$window+$i}){ ${$ktc{substr($line[4],$ct,$k)}}[$i]+=$counts{ +$line[0]}{$start-$window+$i}/$normalize{$line[3]}; } else{ ++$test; } } ++$start; ++$ct; } } else{##Crick Strand until($start < $line[2]+$window){ for(my $i=0;$i<$bins;++$i){ if(defined $normalize{$line[3]} && $normalize{$lin +e[3]} > 0 && $counts{$line[0]}{$start-$window+$i}){ ${$ktc{substr($line[4],$ct,$k)}}[$i]+=$counts{ +$line[0]}{$start-$window+$i}/$normalize{$line[3]}; } else{ ++$test; } } --$start; ++$ct; } } } foreach my $key(keys %ktc){ if($key=~/N/){ next; } print $key,"\t", "@{$ktc{$key}}\n"; } } else{ for(my $r=0;$r<$permute;++$r){#for all permutation iterations foreach my $key(keys %counts){#for each chromosome my @values = shuffle(values %{$counts{$key}});#make a arra +y of values $counts{$key}{$_} = shift @values for keys %{$counts{$key} +};#permute the values to a read position in the chromosome } foreach(@words){ @{$ktc{$_}}= ((0)x$bins) ; } open(SEQ,'<',$sequence_fname)||die "Cannot open sequence file\ +n";#open the sequence file while (my @line=split (/\t/,<SEQ>)){#for each sequence record +the nmer information my $ct=0; my $start=$line[1]; if ($start < $line[2]){##Watson Strand until ($start > $line[2]-$window){ for(my $i=0;$i<$bins;++$i){ if(defined $normalize{$line[3]} && $normalize{ +$line[3]} > 0 && $counts{$line[0]}{$start-$window+$i}){ ${$ktc{substr($line[4],$ct,$k)}}[$i]+=$cou +nts{$line[0]}{$start-$window+$i}/$normalize{$line[3]}; } else{ ++$test; } } ++$start; ++$ct; } } else{##Crick Strand until($start < $line[2]+$window){ for(my $i=0;$i<$bins;++$i){ if(defined $normalize{$line[3]} && $normalize{ +$line[3]} > 0 && $counts{$line[0]}{$start-$window+$i}){ ${$ktc{substr($line[4],$ct,$k)}}[$i]+=$cou +nts{$line[0]}{$start-$window+$i}/$normalize{$line[3]}; } else{ ++$test; } } --$start; ++$ct; } } } close SEQ; open(OUT,">>$out")||die "Cannot open $out\n"; foreach my $key(keys %ktc){ if($key=~/N/){#skip if the nmer contains an N # $ktc{$key}=((0)x$bins);#reset the hash next; } print OUT $key,"\t","@{$ktc{$key}}\n";#print result # $ktc{$key}=((0)x$bins);#reset hash } close OUT; } } sub combin{ my ($data, $q) = @_; return if $q < 1; my $results = $data; while (--$q) { my @new; for my $letter (@$data) { push @new, map { $letter . $_ } @$results; } # end for $letter in @$data $results = \@new; } # end while --$q is not 0 return @$results; } # end ordered_combinations
<div class="body_content"><br /> <table class="file_summary"><tr><td class="h">Filename</td><td align=" +left"><a href="file:///home/jgardin/data/profiling_data/hiseq_07-29/S +ample_gap_FT/nmer_mapping_quant.pl">/home/jgardin/data/profiling_data +/hiseq_07-29/Sample_gap_FT/nmer_mapping_quant.pl</a></td></tr> <tr><td class="h">Statements</td><td align="left">Executed 74725779 st +atements in 63.3s</td></tr></table> <table id="subs_table" border="1" cellpadding="0" class="table +sorter"> <caption>Subroutines</caption> <thead> <tr> <th>Calls</th> <th><span title="Number of Places sub is called from">P</span> +</th> <th><span title="Number of Files sub is called from">F</span>< +/th> <th>Exclusive<br />Time</th> <th>Inclusive<br />Time</th> <th>Subroutine</th> </tr> </thead> <tbody> <tr><td class="c0">775783</td><td class="c0">5</td><td class="c3">1</t +d><td class="c0"><span title="0.4%">268ms</span></td><td class="c0">< +span title="0.4%">268ms</span></td><td class="sub_name"><span style=" +display: none;">main::::CORE:readline</span> main::<a href="nmer_ +mapping_quant-pl-1-line.html#main__CORE_readline">CORE:readline</a>&n +bsp;(opcode)</td></tr> <tr><td class="c0">798209</td><td class="c1">2</td><td class="c3">1</t +d><td class="c0"><span title="0.3%">162ms</span></td><td class="c0">< +span title="0.3%">162ms</span></td><td class="sub_name"><span style=" +display: none;">main::::CORE:match</span> main::<a href="nmer_map +ping_quant-pl-1-line.html#main__CORE_match">CORE:match</a>&nbsp;(opco +de)</td></tr> <tr><td class="c0">8192</td><td class="c3">1</td><td class="c3">1</td> +<td class="c0"><span title="0.0%">16.1ms</span></td><td class="c0"><s +pan title="0.0%">16.1ms</span></td><td class="sub_name"><span style=" +display: none;">main::::CORE:print</span> main::<a href="nmer_map +ping_quant-pl-1-line.html#main__CORE_print">CORE:print</a>&nbsp;(opco +de)</td></tr> <tr><td class="c3">1</td><td class="c3">1</td><td class="c3">1</td><td + class="c0"><span title="0.0%">6.68ms</span></td><td class="c0"><span + title="0.0%">13.0ms</span></td><td class="sub_name"><span style="dis +play: none;">main::::BEGIN@3</span> main::<a href="nmer_mapping_q +uant-pl-1-line.html#3">BEGIN@3</a></td></tr> <tr><td class="c3">1</td><td class="c3">1</td><td class="c3">1</td><td + class="c0"><span title="0.0%">6.31ms</span></td><td class="c0"><span + title="0.0%">6.31ms</span></td><td class="sub_name"><span style="dis +play: none;">main::::combin</span> main::<a href="nmer_mapping_qu +ant-pl-1-line.html#156">combin</a></td></tr> <tr><td class="c3">1</td><td class="c3">1</td><td class="c3">1</td><td + class="c0"><span title="0.0%">1.41ms</span></td><td class="c0"><span + title="0.0%">1.44ms</span></td><td class="sub_name"><span style="dis +play: none;">main::::BEGIN@2</span> main::<a href="nmer_mapping_q +uant-pl-1-line.html#2">BEGIN@2</a></td></tr> <tr><td class="c3">1</td><td class="c3">1</td><td class="c3">1</td><td + class="c3"><span title="0.0%">234µs</span></td><td class="c3"><span +title="0.0%">254µs</span></td><td class="sub_name"><span style="displ +ay: none;">main::::BEGIN@2.1</span> main::<a href="nmer_mapping_q +uant-pl-1-line.html#2">BEGIN@2.1</a></td></tr> <tr><td class="c3">1</td><td class="c3">1</td><td class="c3">1</td><td + class="c3"><span title="0.0%">219µs</span></td><td class="c3"><span +title="0.0%">466µs</span></td><td class="sub_name"><span style="displ +ay: none;">main::::BEGIN@4</span> main::<a href="nmer_mapping_qua +nt-pl-1-line.html#4">BEGIN@4</a></td></tr> <tr><td class="c3">6</td><td class="c0">4</td><td class="c3">1</td><td + class="c3"><span title="0.0%">146µs</span></td><td class="c3"><span +title="0.0%">146µs</span></td><td class="sub_name"><span style="displ +ay: none;">main::::CORE:open</span> main::<a href="nmer_mapping_q +uant-pl-1-line.html#main__CORE_open">CORE:open</a>&nbsp;(opcode)</td> +</tr> <tr><td class="c3">6</td><td class="c0">4</td><td class="c3">1</td><td + class="c3"><span title="0.0%">107µs</span></td><td class="c3"><span +title="0.0%">107µs</span></td><td class="sub_name"><span style="displ +ay: none;">main::::CORE:close</span> main::<a href="nmer_mapping_ +quant-pl-1-line.html#main__CORE_close">CORE:close</a>&nbsp;(opcode)</ +td></tr> <tr><td class="c1">13</td><td class="c3">1</td><td class="c3">1</td><t +d class="c3"><span title="0.0%">12µs</span></td><td class="c3"><span +title="0.0%">12µs</span></td><td class="sub_name"><span style="displa +y: none;">mro::::method_changed_in</span> mro::<a href="nmer_map +ping_quant-pl-1-line.html#mro__method_changed_in">method_changed_in</ +a>&nbsp;(xsub)</td></tr> <tr><td class="c1">13</td><td class="c3">1</td><td class="c3">1</td><t +d class="c3"><span title="0.0%">8µs</span></td><td class="c3"><span t +itle="0.0%">8µs</span></td><td class="sub_name"><span style="display: + none;">utf8::::is_utf8</span> utf8::<a href="nmer_mapping_quant- +pl-1-line.html#utf8__is_utf8">is_utf8</a>&nbsp;(xsub)</td></tr> <tr><td class="c1">13</td><td class="c3">1</td><td class="c3">1</td><t +d class="c3"><span title="0.0%">5µs</span></td><td class="c3"><span t +itle="0.0%">5µs</span></td><td class="sub_name"><span style="display: + none;">Internals::::SvREADONLY</span>Internals::<a href="nmer_mappin +g_quant-pl-1-line.html#Internals__SvREADONLY">SvREADONLY</a>&nbsp;(xs +ub)</td></tr> <tr><td class="c3">0</td><td class="c3">0</td><td class="c3">0</td><td + class="c3"><span title="0.0%">0s</span></td><td class="c3"><span tit +le="0.0%">0s</span></td><td class="sub_name"><span style="display: no +ne;">main::::RUNTIME</span> main::<a href="nmer_mapping_quant-pl- +1-line.html#1">RUNTIME</a></td></tr> </tbody></table> Call graph for these subroutines as a <a href="http://en.wikipedia.org/wiki/Graphviz">Graphv +iz</a> <a href="home-jgardin-data-profiling_data-hiseq_07-29- +Sample_gap_FT-nmer_mapping_quant-pl.dot">dot language file</a>. <table border="1" cellpadding="0"> <thead> <tr><th>Line</th> <th><span title="Number of statements executed">State<br />ments +</span></th> <th><span title="Time spend executing statements on the line, excluding time spent executing statements in any called subrou +tines">Time<br />on line</span></th> <th><span title="Number of subroutines calls">Calls</span></th> <th><span title="Time spent in subroutines called (inclusive)">T +ime<br />in subs</span></th> <th class="left_indent_header">Code</th> </tr> </thead> <tbody> <tr><td class="h"><a name="1"></a>1</td><td></td><td></td><td></td +><td></td><td class="s">#!/usr/bin/perl</td></tr> <tr><td class="h"><a name="2"></a>2</td><td class="c0">4</td><td class +="c0"><span title="Avg 386µs">1.55ms</span></td><td class="c0">4</td> +<td class="c0">1.70ms</td><td class="s"><div class="calls"><div class +="calls_in"># spent 254µs (234+19) within main::BEGIN@2.1 which was c +alled: # once (234µs+19µs) by main::RUNTIME at <a href="nmer_mapping_quant +-pl-1-line.html#2">line 2</a> # spent 1.44ms (1.41+25µs) within main::BEGIN@2 which was called: # once (1.41ms+25µs) by main::RUNTIME at <a href="nmer_mapping_quan +t-pl-1-line.html#2">line 2</a></div></div>use warnings; use strict;<d +iv class="calls"><div class="calls_out"># spent 1.44ms making 1 call + to <a href="nmer_mapping_quant-pl-1-line.html#2">main::BEGIN@2</a> # spent 254µs making 1 call to <a href="nmer_mapping_quant-pl-1-line +.html#2">main::BEGIN@2.1</a> # spent 7µs making 1 call to <a href="warnings-pm-2-line.html#237" +>warnings::import</a> # spent 2µs making 1 call to <a href="strict-pm-3-line.html#34">st +rict::import</a></div></div></td></tr> <tr><td class="h"><a name="3"></a>3</td><td class="c3">2</td><td class +="c1"><span title="Avg 40µs">80µs</span></td><td class="c3">2</td><td + class="c0">14.2ms</td><td class="s"><div class="calls"><div class="c +alls_in"># spent 13.0ms (6.68+6.28) within main::BEGIN@3 which was ca +lled: # once (6.68ms+6.28ms) by main::RUNTIME at <a href="nmer_mapping_qu +ant-pl-1-line.html#3">line 3</a></div></div>use Getopt::Long;<div cla +ss="calls"><div class="calls_out"># spent 13.0ms making 1 call to <a + href="nmer_mapping_quant-pl-1-line.html#3">main::BEGIN@3</a> # spent 1.26ms making 1 call to <a href="Getopt-Long-pm-4-line.html#9 +8">Getopt::Long::import</a></div></div></td></tr> <tr><td class="h"><a name="4"></a>4</td><td class="c3">2</td><td class +="c0"><span title="Avg 392µs">784µs</span></td><td class="c3">2</td>< +td class="c0">504µs</td><td class="s"><div class="calls"><div class=" +calls_in"># spent 466µs (219+247) within main::BEGIN@4 which was call +ed: # once (219µs+247µs) by main::RUNTIME at <a href="nmer_mapping_quan +t-pl-1-line.html#4">line 4</a></div></div>use List::Util qw(shuffle); +<div class="calls"><div class="calls_out"># spent 466µs making 1 ca +ll to <a href="nmer_mapping_quant-pl-1-line.html#4">main::BEGIN@4</a> # spent 38µs making 1 call to <a href="Exporter-pm-7-line.html#28"> +Exporter::import</a></div></div></td></tr> <tr><td class="h"><a name="5"></a>5</td><td class="c3">1</td><td class +="c3"><span title="Avg 500ns">500ns</span></td><td></td><td></td><td +class="s">my $k=6;</td></tr> <tr><td class="h"><a name="6"></a>6</td><td class="c3">1</td><td class +="c3"><span title="Avg 100ns">100ns</span></td><td></td><td></td><td +class="s">my $window=31;</td></tr> <tr><td class="h"><a name="7"></a>7</td><td class="c3">1</td><td class +="c3"><span title="Avg 900ns">900ns</span></td><td></td><td></td><td +class="s">my $bins=2*$window+$k;</td></tr> <tr><td class="h"><a name="8"></a>8</td><td class="c3">1</td><td class +="c3"><span title="Avg 100ns">100ns</span></td><td></td><td></td><td +class="s">my %ktc;</td></tr> <tr><td class="h"><a name="9"></a>9</td><td class="c3">1</td><td class +="c3"><span title="Avg 100ns">100ns</span></td><td></td><td></td><td +class="s">my $sequence_fname;</td></tr> <tr><td class="h"><a name="10"></a>10</td><td class="c3">1</td><td cla +ss="c3"><span title="Avg 100ns">100ns</span></td><td></td><td></td><t +d class="s">my $wiggle_fname;</td></tr> <tr><td class="h"><a name="11"></a>11</td><td class="c3">1</td><td cla +ss="c3"><span title="Avg 200ns">200ns</span></td><td></td><td></td><t +d class="s">my $permute=0;</td></tr> <tr><td class="h"><a name="12"></a>12</td><td class="c3">1</td><td cla +ss="c3"><span title="Avg 100ns">100ns</span></td><td></td><td></td><t +d class="s">my $normalize_fname;</td></tr> <tr><td class="h"><a name="13"></a>13</td><td class="c3">1</td><td cla +ss="c3"><span title="Avg 500ns">500ns</span></td><td></td><td></td><t +d class="s">my $out='default_out';</td></tr> <tr><td class="h"><a name="14"></a>14</td><td class="c3">1</td><td cla +ss="c3"><span title="Avg 3µs">3µs</span></td><td class="c3">1</td><td + class="c3">5µs</td><td class="s">GetOptions (<div class="calls"><div + class="calls_out"># spent 5µs making 1 call to <a href="Getopt-L +ong-pm-4-line.html#249">Getopt::Long::GetOptions</a></div></div></td> +</tr> <tr><td class="h"><a name="15"></a>15</td><td></td><td></td><td></td>< +td></td><td class="s"> &quot;o=s&quot; =&gt; \$out,</td></tr> <tr><td class="h"><a name="16"></a>16</td><td></td><td></td><td></td>< +td></td><td class="s"> &quot;s=s&quot; =&gt; \$sequence_fname, +</td></tr> <tr><td class="h"><a name="17"></a>17</td><td></td><td></td><td></td>< +td></td><td class="s"> &quot;wig=s&quot; =&gt; \$wiggle_fname, +</td></tr> <tr><td class="h"><a name="18"></a>18</td><td></td><td></td><td></td>< +td></td><td class="s"> &quot;n=s&quot; =&gt; \$normalize_fname +,</td></tr> <tr><td class="h"><a name="19"></a>19</td><td></td><td></td><td></td>< +td></td><td class="s"> &quot;p=i&quot; =&gt; \$permute);</td>< +/tr> <tr><td class="h"><a name="20"></a>20</td><td></td><td></td><td></td>< +td></td><td class="s"></td></tr> <tr><td class="h"><a name="21"></a>21</td><td></td><td></td><td></td>< +td></td><td class="s">############initialize %ktc</td></tr> <tr><td class="h"><a name="22"></a>22</td><td class="c3">1</td><td cla +ss="c3"><span title="Avg 500ns">500ns</span></td><td></td><td></td><t +d class="s">$out=$out.'counts.tab';</td></tr> <tr><td class="h"><a name="23"></a>23</td><td class="c3">1</td><td cla +ss="c3"><span title="Avg 1µs">1µs</span></td><td></td><td></td><td cl +ass="s">my $letters=[ qw(A C G T N) ];</td></tr> <tr><td class="h"><a name="24"></a>24</td><td class="c3">1</td><td cla +ss="c0"><span title="Avg 1.02ms">1.02ms</span></td><td class="c3">1</ +td><td class="c0">6.31ms</td><td class="s">my @words=&amp;combin($let +ters,$k );<div class="calls"><div class="calls_out"># spent 6.31ms m +aking 1 call to <a href="nmer_mapping_quant-pl-1-line.html#156">main: +:combin</a></div></div></td></tr> <tr><td class="h"><a name="25"></a>25</td><td class="c3">1</td><td cla +ss="c3"><span title="Avg 5µs">5µs</span></td><td></td><td></td><td cl +ass="s">foreach(@words){</td></tr> <tr><td class="h"><a name="26"></a>26</td><td class="c0">15625</td><td + class="c0"><span title="Avg 4µs">56.3ms</span></td><td></td><td></td +><td class="s"> @{$ktc{$_}}= ((0)x$bins) ;</td></tr> <tr><td class="h"><a name="27"></a>27</td><td></td><td></td><td></td>< +td></td><td class="s">}</td></tr> <tr><td class="h"><a name="28"></a>28</td><td></td><td></td><td></td>< +td></td><td class="s">##########</td></tr> <tr><td class="h"><a name="29"></a>29</td><td></td><td></td><td></td>< +td></td><td class="s">##########Load normalization file data</td></tr +> <tr><td class="h"><a name="30"></a>30</td><td class="c3">1</td><td cla +ss="c3"><span title="Avg 42µs">42µs</span></td><td class="c3">1</td>< +td class="c3">29µs</td><td class="s">open(NORM,&quot;&lt;$normalize_f +name&quot;)||die &quot;Cannot open normalization file\n&quot;;<div cl +ass="calls"><div class="calls_out"># spent 29µs making 1 call to < +a href="nmer_mapping_quant-pl-1-line.html#main__CORE_open">main::CORE +:open</a></div></div></td></tr> <tr><td class="h"><a name="31"></a>31</td><td class="c3">1</td><td cla +ss="c3"><span title="Avg 400ns">400ns</span></td><td></td><td></td><t +d class="s">my %normalize;</td></tr> <tr><td class="h"><a name="32"></a>32</td><td class="c3">1</td><td cla +ss="c3"><span title="Avg 18µs">18µs</span></td><td class="c3">1</td>< +td class="c3">13µs</td><td class="s">while (&lt;NORM&gt;){<div class= +"calls"><div class="calls_out"># spent 13µs making 1 call to <a hr +ef="nmer_mapping_quant-pl-1-line.html#main__CORE_readline">main::CORE +:readline</a></div></div></td></tr> <tr><td class="h"><a name="33"></a>33</td><td class="c0">8066</td><td +class="c0"><span title="Avg 573ns">4.62ms</span></td><td></td><td></t +d><td class="s"> my @a=split(/\t/,$_);</td></tr> <tr><td class="h"><a name="34"></a>34</td><td class="c0">8066</td><td +class="c0"><span title="Avg 3µs">20.3ms</span></td><td class="c0">806 +6</td><td class="c0">2.49ms</td><td class="s"> $normalize{$a[0 +]}=$a[1];<div class="calls"><div class="calls_out"> # spent 2 +.49ms making 8066 calls to <a href="nmer_mapping_quant-pl-1-line.html +#main__CORE_readline">main::CORE:readline</a>, avg 309ns/call</div></ +div></td></tr> <tr><td class="h"><a name="35"></a>35</td><td></td><td></td><td></td>< +td></td><td class="s">}</td></tr> <tr><td class="h"><a name="36"></a>36</td><td class="c3">1</td><td cla +ss="c3"><span title="Avg 14µs">14µs</span></td><td class="c3">1</td>< +td class="c3">10µs</td><td class="s">close NORM;<div class="calls"><d +iv class="calls_out"># spent 10µs making 1 call to <a href="nmer_m +apping_quant-pl-1-line.html#main__CORE_close">main::CORE:close</a></d +iv></div></td></tr> <tr><td class="h"><a name="37"></a>37</td><td></td><td></td><td></td>< +td></td><td class="s">##########Load position counts</td></tr> <tr><td class="h"><a name="38"></a>38</td><td class="c3">1</td><td cla +ss="c3"><span title="Avg 200ns">200ns</span></td><td></td><td></td><t +d class="s">my %counts;</td></tr> <tr><td class="h"><a name="39"></a>39</td><td class="c3">1</td><td cla +ss="c3"><span title="Avg 9µs">9µs</span></td><td class="c3">1</td><td + class="c3">6µs</td><td class="s">open (CTS,'&lt;',$wiggle_fname)||di +e &quot;Cannot open wiggle file\n&quot;;<div class="calls"><div class +="calls_out"># spent 6µs making 1 call to <a href="nmer_mapping_q +uant-pl-1-line.html#main__CORE_open">main::CORE:open</a></div></div>< +/td></tr> <tr><td class="h"><a name="40"></a>40</td><td class="c3">1</td><td cla +ss="c3"><span title="Avg 400ns">400ns</span></td><td></td><td></td><t +d class="s">my $cur_chr='';</td></tr> <tr><td class="h"><a name="41"></a>41</td><td class="c3">1</td><td cla +ss="c3"><span title="Avg 200ns">200ns</span></td><td></td><td></td><t +d class="s">my $pos='';</td></tr> <tr><td class="h"><a name="42"></a>42</td><td class="c3">1</td><td cla +ss="c3"><span title="Avg 300ns">300ns</span></td><td></td><td></td><t +d class="s">my $counts='';</td></tr> <tr><td class="h"><a name="43"></a>43</td><td class="c3">1</td><td cla +ss="c3"><span title="Avg 8µs">8µs</span></td><td class="c3">1</td><td + class="c3">6µs</td><td class="s">while (&lt;CTS&gt;){<div class="cal +ls"><div class="calls_out"># spent 6µs making 1 call to <a href=" +nmer_mapping_quant-pl-1-line.html#main__CORE_readline">main::CORE:rea +dline</a></div></div></td></tr> <tr><td class="h"><a name="44"></a>44</td><td class="c0">766959</td><t +d class="c0"><span title="Avg 4µs">2.81s</span></td><td class="c0">15 +33918</td><td class="c0">389ms</td><td class="s"> if($_=~/vari +ableStep chrom=(.*?)$/){<div class="calls"><div class="calls_out"> + # spent 242ms making 766959 calls to <a href="nmer_mapping_qua +nt-pl-1-line.html#main__CORE_readline">main::CORE:readline</a>, avg 3 +16ns/call # spent 147ms making 766959 calls to <a href="nmer_mapping_q +uant-pl-1-line.html#main__CORE_match">main::CORE:match</a>, avg 191ns +/call</div></div></td></tr> <tr><td class="h"><a name="45"></a>45</td><td class="c0">16</td><td cl +ass="c3"><span title="Avg 2µs">39µs</span></td><td></td><td></td><td +class="s"> $cur_chr=$1;</td></tr> <tr><td class="h"><a name="46"></a>46</td><td class="c0">16</td><td cl +ass="c3"><span title="Avg 1µs">24µs</span></td><td></td><td></td><td +class="s"> $counts{$cur_chr}={};</td></tr> <tr><td class="h"><a name="47"></a>47</td><td></td><td></td><td></td>< +td></td><td class="s"> }</td></tr> <tr><td class="h"><a name="48"></a>48</td><td></td><td></td><td></td>< +td></td><td class="s"> else{</td></tr> <tr><td class="h"><a name="49"></a>49</td><td class="c0">766943</td><t +d class="c0"><span title="Avg 770ns">591ms</span></td><td></td><td></ +td><td class="s"> ($pos,$counts)=split(/\t/,$_);</td>< +/tr> <tr><td class="h"><a name="50"></a>50</td><td class="c0">766943</td><t +d class="c0"><span title="Avg 842ns">646ms</span></td><td></td><td></ +td><td class="s"> $counts{$cur_chr}{$pos}=$counts;</td +></tr> <tr><td class="h"><a name="51"></a>51</td><td></td><td></td><td></td>< +td></td><td class="s"> }</td></tr> <tr><td class="h"><a name="52"></a>52</td><td></td><td></td><td></td>< +td></td><td class="s">}</td></tr> <tr><td class="h"><a name="53"></a>53</td><td class="c3">1</td><td cla +ss="c3"><span title="Avg 12µs">12µs</span></td><td class="c3">1</td>< +td class="c3">6µs</td><td class="s">close CTS;<div class="calls"><div + class="calls_out"># spent 6µs making 1 call to <a href="nmer_map +ping_quant-pl-1-line.html#main__CORE_close">main::CORE:close</a></div +></div></td></tr> <tr><td class="h"><a name="54"></a>54</td><td></td><td></td><td></td>< +td></td><td class="s">#######</td></tr> <tr><td class="h"><a name="55"></a>55</td><td></td><td></td><td></td>< +td></td><td class="s">#########Map counts to orfs</td></tr> <tr><td class="h"><a name="56"></a>56</td><td class="c3">1</td><td cla +ss="c3"><span title="Avg 500ns">500ns</span></td><td></td><td></td><t +d class="s">my $test=0;</td></tr> <tr><td class="h"><a name="57"></a>57</td><td class="c3">1</td><td cla +ss="c0"><span title="Avg 456ms">456ms</span></td><td></td><td></td><t +d class="s">if(!($permute)){</td></tr> <tr><td class="h"><a name="58"></a>58</td><td></td><td></td><td></td>< +td></td><td class="s"> open(SEQ,'&lt;',$sequence_fname)||die & +quot;Cannot open sequence file\n&quot;;</td></tr> <tr><td class="h"><a name="59"></a>59</td><td></td><td></td><td></td>< +td></td><td class="s"> while (my @line=split (/\t/,&lt;SEQ&gt; +)){</td></tr> <tr><td class="h"><a name="60"></a>60</td><td></td><td></td><td></td>< +td></td><td class="s"> my $ct=0;</td></tr> <tr><td class="h"><a name="61"></a>61</td><td></td><td></td><td></td>< +td></td><td class="s"> my $start=$line[1];</td></tr> <tr><td class="h"><a name="62"></a>62</td><td></td><td></td><td></td>< +td></td><td class="s"> if ($start &lt; $line[2]){##Wat +son Strand</td></tr> <tr><td class="h"><a name="63"></a>63</td><td></td><td></td><td></td>< +td></td><td class="s"> until ($start &gt; $lin +e[2]-$window){</td></tr> <tr><td class="h"><a name="64"></a>64</td><td></td><td></td><td></td>< +td></td><td class="s"> for(my $i=0;$i& +lt;$bins;++$i){</td></tr> <tr><td class="h"><a name="65"></a>65</td><td></td><td></td><td></td>< +td></td><td class="s"> if(defi +ned $normalize{$line[3]} &amp;&amp; $normalize{$line[3]} &gt; 0 &amp; +&amp; $counts{$line[0]}{$start-$window+$i}){</td></tr> <tr><td class="h"><a name="66"></a>66</td><td></td><td></td><td></td>< +td></td><td class="s"> + ${$ktc{substr($line[4],$ct,$k)}}[$i]+=$counts{$line[0]}{$start-$wind +ow+$i}/$normalize{$line[3]};</td></tr> <tr><td class="h"><a name="67"></a>67</td><td></td><td></td><td></td>< +td></td><td class="s"> }</td>< +/tr> <tr><td class="h"><a name="68"></a>68</td><td></td><td></td><td></td>< +td></td><td class="s"> else{</ +td></tr> <tr><td class="h"><a name="69"></a>69</td><td></td><td></td><td></td>< +td></td><td class="s"> + ++$test;</td></tr> <tr><td class="h"><a name="70"></a>70</td><td></td><td></td><td></td>< +td></td><td class="s"> }</td>< +/tr> <tr><td class="h"><a name="71"></a>71</td><td></td><td></td><td></td>< +td></td><td class="s"> }</td></tr> <tr><td class="h"><a name="72"></a>72</td><td></td><td></td><td></td>< +td></td><td class="s"></td></tr> <tr><td class="h"><a name="73"></a>73</td><td></td><td></td><td></td>< +td></td><td class="s"> ++$start;</td>< +/tr> <tr><td class="h"><a name="74"></a>74</td><td></td><td></td><td></td>< +td></td><td class="s"> ++$ct;</td></tr +> <tr><td class="h"><a name="75"></a>75</td><td></td><td></td><td></td>< +td></td><td class="s"> }</td></tr> <tr><td class="h"><a name="76"></a>76</td><td></td><td></td><td></td>< +td></td><td class="s"> }</td></tr> <tr><td class="h"><a name="77"></a>77</td><td></td><td></td><td></td>< +td></td><td class="s"> else{##Crick Strand</td></tr> <tr><td class="h"><a name="78"></a>78</td><td></td><td></td><td></td>< +td></td><td class="s"> until($start &lt; $line +[2]+$window){</td></tr> <tr><td class="h"><a name="79"></a>79</td><td></td><td></td><td></td>< +td></td><td class="s"> for(my $i=0;$i& +lt;$bins;++$i){</td></tr> <tr><td class="h"><a name="80"></a>80</td><td></td><td></td><td></td>< +td></td><td class="s"> if(defi +ned $normalize{$line[3]} &amp;&amp; $normalize{$line[3]} &gt; 0 &amp; +&amp; $counts{$line[0]}{$start-$window+$i}){</td></tr> <tr><td class="h"><a name="81"></a>81</td><td></td><td></td><td></td>< +td></td><td class="s"> + ${$ktc{substr($line[4],$ct,$k)}}[$i]+=$counts{$line[0]}{$start-$wind +ow+$i}/$normalize{$line[3]};</td></tr> <tr><td class="h"><a name="82"></a>82</td><td></td><td></td><td></td>< +td></td><td class="s"> }</td>< +/tr> <tr><td class="h"><a name="83"></a>83</td><td></td><td></td><td></td>< +td></td><td class="s"> else{</ +td></tr> <tr><td class="h"><a name="84"></a>84</td><td></td><td></td><td></td>< +td></td><td class="s"> + ++$test;</td></tr> <tr><td class="h"><a name="85"></a>85</td><td></td><td></td><td></td>< +td></td><td class="s"> }</td>< +/tr> <tr><td class="h"><a name="86"></a>86</td><td></td><td></td><td></td>< +td></td><td class="s"> }</td></tr> <tr><td class="h"><a name="87"></a>87</td><td></td><td></td><td></td>< +td></td><td class="s"> --$start;</td>< +/tr> <tr><td class="h"><a name="88"></a>88</td><td></td><td></td><td></td>< +td></td><td class="s"> ++$ct;</td></tr +> <tr><td class="h"><a name="89"></a>89</td><td></td><td></td><td></td>< +td></td><td class="s"> }</td></tr> <tr><td class="h"><a name="90"></a>90</td><td></td><td></td><td></td>< +td></td><td class="s"> }</td></tr> <tr><td class="h"><a name="91"></a>91</td><td></td><td></td><td></td>< +td></td><td class="s"> }</td></tr> <tr><td class="h"><a name="92"></a>92</td><td></td><td></td><td></td>< +td></td><td class="s"> foreach my $key(keys %ktc){</td></tr> <tr><td class="h"><a name="93"></a>93</td><td></td><td></td><td></td>< +td></td><td class="s"> if($key=~/N/){</td></tr> <tr><td class="h"><a name="94"></a>94</td><td></td><td></td><td></td>< +td></td><td class="s"> next;</td></tr> <tr><td class="h"><a name="95"></a>95</td><td></td><td></td><td></td>< +td></td><td class="s"> }</td></tr> <tr><td class="h"><a name="96"></a>96</td><td></td><td></td><td></td>< +td></td><td class="s"> print $key,&quot;\t&quot;, &quo +t;@{$ktc{$key}}\n&quot;;</td></tr> <tr><td class="h"><a name="97"></a>97</td><td></td><td></td><td></td>< +td></td><td class="s"> }</td></tr> <tr><td class="h"><a name="98"></a>98</td><td></td><td></td><td></td>< +td></td><td class="s">}</td></tr> <tr><td class="h"><a name="99"></a>99</td><td></td><td></td><td></td>< +td></td><td class="s">else{</td></tr> <tr><td class="h"><a name="100"></a>100</td><td class="c3">1</td><td c +lass="c3"><span title="Avg 7µs">7µs</span></td><td></td><td></td><td +class="s"> for(my $r=0;$r&lt;$permute;++$r){#for all permutati +on iterations</td></tr> <tr><td class="h"><a name="101"></a>101</td><td class="c3">2</td><td c +lass="c3"><span title="Avg 9µs">19µs</span></td><td></td><td></td><td + class="s"> foreach my $key(keys %counts){#for each ch +romosome</td></tr> <tr><td class="h"><a name="102"></a>102</td><td class="c0">32</td><td +class="c0"><span title="Avg 13.8ms">443ms</span></td><td class="c0">3 +2</td><td class="c0">19.7ms</td><td class="s"> + my @values = shuffle(values %{$counts{$key}});#make a array of value +s<div class="calls"><div class="calls_out"> # +spent 19.7ms making 32 calls to <a href="List-Util-pm-11-line.html#L +ist__Util__shuffle">List::Util::shuffle</a>, avg 616µs/call</div></di +v></td></tr> <tr><td class="h"><a name="103"></a>103</td><td class="c0">32</td><td +class="c0"><span title="Avg 38.6ms">1.24s</span></td><td></td><td></t +d><td class="s"> $counts{$key}{$_} = shift @va +lues for keys %{$counts{$key}};#permute the values to a read position + in the chromosome</td></tr> <tr><td class="h"><a name="104"></a>104</td><td></td><td></td><td></td +><td></td><td class="s"> }</td></tr> <tr><td class="h"><a name="105"></a>105</td><td class="c3">2</td><td c +lass="c3"><span title="Avg 3µs">6µs</span></td><td></td><td></td><td +class="s"> foreach(@words){</td></tr> <tr><td class="h"><a name="106"></a>106</td><td class="c0">31250</td>< +td class="c0"><span title="Avg 6µs">197ms</span></td><td></td><td></t +d><td class="s"> @{$ktc{$_}}= ((0)x$bins) ;</t +d></tr> <tr><td class="h"><a name="107"></a>107</td><td></td><td></td><td></td +><td></td><td class="s"> }</td></tr> <tr><td class="h"><a name="108"></a>108</td><td class="c3">2</td><td c +lass="c2"><span title="Avg 34µs">68µs</span></td><td class="c3">2</td +><td class="c3">46µs</td><td class="s"> open(SEQ,'&lt; +',$sequence_fname)||die &quot;Cannot open sequence file\n&quot;;#open + the sequence file<div class="calls"><div class="calls_out"> + # spent 46µs making 2 calls to <a href="nmer_mapping_quant- +pl-1-line.html#main__CORE_open">main::CORE:open</a>, avg 23µs/call</d +iv></div></td></tr> <tr><td class="h"><a name="109"></a>109</td><td class="c3">2</td><td c +lass="c0"><span title="Avg 23.0ms">45.9ms</span></td><td class="c0">7 +56</td><td class="c0">23.0ms</td><td class="s"> while +(my @line=split (/\t/,&lt;SEQ&gt;)){#for each sequence record the nme +r information<div class="calls"><div class="calls_out"> + # spent 23.0ms making 756 calls to <a href="nmer_mapping_quant-pl- +1-line.html#main__CORE_readline">main::CORE:readline</a>, avg 30µs/ca +ll</div></div></td></tr> <tr><td class="h"><a name="110"></a>110</td><td class="c0">754</td><td + class="c0"><span title="Avg 314ns">237µs</span></td><td></td><td></t +d><td class="s"> my $ct=0;</td></tr> <tr><td class="h"><a name="111"></a>111</td><td class="c0">754</td><td + class="c0"><span title="Avg 330ns">249µs</span></td><td></td><td></t +d><td class="s"> my $start=$line[1];</td></tr> <tr><td class="h"><a name="112"></a>112</td><td class="c0">754</td><td + class="c0"><span title="Avg 2µs">1.42ms</span></td><td></td><td></td +><td class="s"> if ($start &lt; $line[2]){##Wa +tson Strand</td></tr> <tr><td class="h"><a name="113"></a>113</td><td class="c0">754</td><td + class="c0"><span title="Avg 523ns">394µs</span></td><td></td><td></t +d><td class="s"> until ($start &gt; $l +ine[2]-$window){</td></tr> <tr><td class="h"><a name="114"></a>114</td><td class="c0">1112444</td +><td class="c0"><span title="Avg 45µs">49.7s</span></td><td></td><td> +</td><td class="s"> for(my $i= +0;$i&lt;$bins;++$i){</td></tr> <tr><td class="h"><a name="115"></a>115</td><td></td><td></td><td></td +><td></td><td class="s"> + if(defined $normalize{$line[3]} &amp;&amp; $normalize{$line[3]} &g +t; 0 &amp;&amp; $counts{$line[0]}{$start-$window+$i}){</td></tr> <tr><td class="h"><a name="116"></a>116</td><td></td><td></td><td></td +><td></td><td class="s"> + ${$ktc{substr($line[4],$ct,$k)}}[$i]+=$counts{$line[0]}{$s +tart-$window+$i}/$normalize{$line[3]};</td></tr> <tr><td class="h"><a name="117"></a>117</td><td></td><td></td><td></td +><td></td><td class="s"> + }</td></tr> <tr><td class="h"><a name="118"></a>118</td><td></td><td></td><td></td +><td></td><td class="s"> + else{</td></tr> <tr><td class="h"><a name="119"></a>119</td><td class="c0">68958886</t +d><td class="c0"><span title="Avg 91ns">6.27s</span></td><td></td><td +></td><td class="s"> + ++$test;</td></tr> <tr><td class="h"><a name="120"></a>120</td><td></td><td></td><td></td +><td></td><td class="s"> + }</td></tr> <tr><td class="h"><a name="121"></a>121</td><td></td><td></td><td></td +><td></td><td class="s"> }</td +></tr> <tr><td class="h"><a name="122"></a>122</td><td class="c0">1112444</td +><td class="c0"><span title="Avg 59ns">66.2ms</span></td><td></td><td +></td><td class="s"> ++$start; +</td></tr> <tr><td class="h"><a name="123"></a>123</td><td class="c0">1112444</td +><td class="c0"><span title="Avg 296ns">330ms</span></td><td></td><td +></td><td class="s"> ++$ct;</t +d></tr> <tr><td class="h"><a name="124"></a>124</td><td></td><td></td><td></td +><td></td><td class="s"> }</td></tr> <tr><td class="h"><a name="125"></a>125</td><td></td><td></td><td></td +><td></td><td class="s"> }</td></tr> <tr><td class="h"><a name="126"></a>126</td><td></td><td></td><td></td +><td></td><td class="s"> else{##Crick Strand</ +td></tr> <tr><td class="h"><a name="127"></a>127</td><td></td><td></td><td></td +><td></td><td class="s"> until($start +&lt; $line[2]+$window){</td></tr> <tr><td class="h"><a name="128"></a>128</td><td></td><td></td><td></td +><td></td><td class="s"> for(m +y $i=0;$i&lt;$bins;++$i){</td></tr> <tr><td class="h"><a name="129"></a>129</td><td></td><td></td><td></td +><td></td><td class="s"> + if(defined $normalize{$line[3]} &amp;&amp; $normalize{$line[3]} &g +t; 0 &amp;&amp; $counts{$line[0]}{$start-$window+$i}){</td></tr> <tr><td class="h"><a name="130"></a>130</td><td></td><td></td><td></td +><td></td><td class="s"> + ${$ktc{substr($line[4],$ct,$k)}}[$i]+=$counts{$line[0]}{$s +tart-$window+$i}/$normalize{$line[3]};</td></tr> <tr><td class="h"><a name="131"></a>131</td><td></td><td></td><td></td +><td></td><td class="s"> + }</td></tr> <tr><td class="h"><a name="132"></a>132</td><td></td><td></td><td></td +><td></td><td class="s"> + else{</td></tr> <tr><td class="h"><a name="133"></a>133</td><td></td><td></td><td></td +><td></td><td class="s"> + ++$test;</td></tr> <tr><td class="h"><a name="134"></a>134</td><td></td><td></td><td></td +><td></td><td class="s"> + }</td></tr> <tr><td class="h"><a name="135"></a>135</td><td></td><td></td><td></td +><td></td><td class="s"> }</td +></tr> <tr><td class="h"><a name="136"></a>136</td><td></td><td></td><td></td +><td></td><td class="s"> --$st +art;</td></tr> <tr><td class="h"><a name="137"></a>137</td><td></td><td></td><td></td +><td></td><td class="s"> ++$ct +;</td></tr> <tr><td class="h"><a name="138"></a>138</td><td></td><td></td><td></td +><td></td><td class="s"> }</td></tr> <tr><td class="h"><a name="139"></a>139</td><td></td><td></td><td></td +><td></td><td class="s"> }</td></tr> <tr><td class="h"><a name="140"></a>140</td><td></td><td></td><td></td +><td></td><td class="s"> }</td></tr> <tr><td class="h"><a name="141"></a>141</td><td class="c3">2</td><td c +lass="c3"><span title="Avg 12µs">25µs</span></td><td class="c3">2</td +><td class="c3">15µs</td><td class="s"> close SEQ;<div + class="calls"><div class="calls_out"> # spent 15µs + making 2 calls to <a href="nmer_mapping_quant-pl-1-line.html#main__C +ORE_close">main::CORE:close</a>, avg 7µs/call</div></div></td></tr> <tr><td class="h"><a name="142"></a>142</td><td class="c3">2</td><td c +lass="c2"><span title="Avg 38µs">76µs</span></td><td class="c3">2</td +><td class="c3">65µs</td><td class="s"> open(OUT,&quot +;&gt;&gt;$out&quot;)||die &quot;Cannot open $out\n&quot;;<div class=" +calls"><div class="calls_out"> # spent 65µs making +2 calls to <a href="nmer_mapping_quant-pl-1-line.html#main__CORE_open +">main::CORE:open</a>, avg 32µs/call</div></div></td></tr> <tr><td class="h"><a name="143"></a>143</td><td class="c3">2</td><td c +lass="c0"><span title="Avg 5.42ms">10.8ms</span></td><td></td><td></t +d><td class="s"> foreach my $key(keys %ktc){</td></tr> <tr><td class="h"><a name="144"></a>144</td><td class="c0">31250</td>< +td class="c0"><span title="Avg 2µs">75.0ms</span></td><td class="c0"> +31250</td><td class="c0">15.8ms</td><td class="s"> + if($key=~/N/){#skip if the nmer contains an N<div class="calls"> +<div class="calls_out"> # spent 15.8ms making + 31250 calls to <a href="nmer_mapping_quant-pl-1-line.html#main__CORE +_match">main::CORE:match</a>, avg 504ns/call</div></div></td></tr> <tr><td class="h"><a name="145"></a>145</td><td></td><td></td><td></td +><td></td><td class="s"> # + $ktc{$key}=((0)x$bins);#reset the hash</td></tr> <tr><td class="h"><a name="146"></a>146</td><td class="c0">23058</td>< +td class="c0"><span title="Avg 166ns">3.82ms</span></td><td></td><td> +</td><td class="s"> next;</td></tr> <tr><td class="h"><a name="147"></a>147</td><td></td><td></td><td></td +><td></td><td class="s"> }</td></tr> <tr><td class="h"><a name="148"></a>148</td><td class="c0">8192</td><t +d class="c0"><span title="Avg 44µs">358ms</span></td><td class="c0">8 +192</td><td class="c0">16.1ms</td><td class="s"> + print OUT $key,&quot;\t&quot;,&quot;@{$ktc{$key}}\n&quot;;#print r +esult<div class="calls"><div class="calls_out"> + # spent 16.1ms making 8192 calls to <a href="nmer_mapping_quant-pl +-1-line.html#main__CORE_print">main::CORE:print</a>, avg 2µs/call</di +v></div></td></tr> <tr><td class="h"><a name="149"></a>149</td><td></td><td></td><td></td +><td></td><td class="s"># $ktc{$key}=((0)x$bin +s);#reset hash </td></tr> <tr><td class="h"><a name="150"></a>150</td><td></td><td></td><td></td +><td></td><td class="s"> }</td></tr> <tr><td class="h"><a name="151"></a>151</td><td class="c3">2</td><td c +lass="c1"><span title="Avg 52µs">103µs</span></td><td class="c3">2</t +d><td class="c3">76µs</td><td class="s"> close OUT;<di +v class="calls"><div class="calls_out"> # spent 76µ +s making 2 calls to <a href="nmer_mapping_quant-pl-1-line.html#main__ +CORE_close">main::CORE:close</a>, avg 38µs/call</div></div></td></tr> <tr><td class="h"><a name="152"></a>152</td><td></td><td></td><td></td +><td></td><td class="s"> }</td></tr> <tr><td class="h"><a name="153"></a>153</td><td></td><td></td><td></td +><td></td><td class="s">}</td></tr> <tr><td class="h"><a name="154"></a>154</td><td></td><td></td><td></td +><td></td><td class="s"></td></tr> <tr><td class="h"><a name="155"></a>155</td><td></td><td></td><td></td +><td></td><td class="s"></td></tr> <tr><td class="h"><a name="156"></a>156</td><td></td><td></td><td></td +><td></td><td class="s"><div class="calls"><div class="calls_in"># sp +ent 6.31ms within main::combin which was called: # once (6.31ms+0s) by main::RUNTIME at <a href="nmer_mapping_quant- +pl-1-line.html#24">line 24</a></div></div>sub combin{</td></tr> <tr><td class="h"><a name="157"></a>157</td><td class="c3">1</td><td c +lass="c3"><span title="Avg 500ns">500ns</span></td><td></td><td></td> +<td class="s"> my ($data, $q) = @_;</td></tr> <tr><td class="h"><a name="158"></a>158</td><td></td><td></td><td></td +><td></td><td class="s"></td></tr> <tr><td class="h"><a name="159"></a>159</td><td class="c3">1</td><td c +lass="c3"><span title="Avg 200ns">200ns</span></td><td></td><td></td> +<td class="s"> return if $q &lt; 1;</td></tr> <tr><td class="h"><a name="160"></a>160</td><td></td><td></td><td></td +><td></td><td class="s"></td></tr> <tr><td class="h"><a name="161"></a>161</td><td class="c3">1</td><td c +lass="c3"><span title="Avg 200ns">200ns</span></td><td></td><td></td> +<td class="s"> my $results = $data;</td></tr> <tr><td class="h"><a name="162"></a>162</td><td></td><td></td><td></td +><td></td><td class="s"></td></tr> <tr><td class="h"><a name="163"></a>163</td><td class="c3">1</td><td c +lass="c3"><span title="Avg 400ns">400ns</span></td><td></td><td></td> +<td class="s"> while (--$q) {</td></tr> <tr><td class="h"><a name="164"></a>164</td><td class="c0">5</td><td c +lass="c3"><span title="Avg 80ns">400ns</span></td><td></td><td></td>< +td class="s"> my @new;</td></tr> <tr><td class="h"><a name="165"></a>165</td><td class="c0">5</td><td c +lass="c3"><span title="Avg 440ns">2µs</span></td><td></td><td></td><t +d class="s"> for my $letter (@$data) {</td></tr> <tr><td class="h"><a name="166"></a>166</td><td class="c0">25</td><td +class="c0"><span title="Avg 180µs">4.50ms</span></td><td></td><td></t +d><td class="s"> push @new, map { $letter . $_ + } @$results;</td></tr> <tr><td class="h"><a name="167"></a>167</td><td></td><td></td><td></td +><td></td><td class="s"> } # end for $letter in @$data +</td></tr> <tr><td class="h"><a name="168"></a>168</td><td></td><td></td><td></td +><td></td><td class="s"></td></tr> <tr><td class="h"><a name="169"></a>169</td><td class="c0">5</td><td c +lass="c1"><span title="Avg 18µs">92µs</span></td><td></td><td></td><t +d class="s"> $results = \@new;</td></tr> <tr><td class="h"><a name="170"></a>170</td><td></td><td></td><td></td +><td></td><td class="s"> } # end while --$q is not 0</td></tr> <tr><td class="h"><a name="171"></a>171</td><td></td><td></td><td></td +><td></td><td class="s"></td></tr> <tr><td class="h"><a name="172"></a>172</td><td class="c3">1</td><td c +lass="c0"><span title="Avg 1.73ms">1.73ms</span></td><td></td><td></t +d><td class="s"> return @$results;</td></tr> <tr><td class="h"><a name="173"></a>173</td><td></td><td></td><td></td +><td></td><td class="s">} # end ordered_combinations</td></tr> <tr><td class="s"><a name=""></a>&nbsp;</td><td></td><td></td><td></td +><td></td><td class="s"></td></tr> <tr><td class="h"><a name="Internals__SvREADONLY"></a></td><td></td><t +d></td><td></td><td></td><td class="s"><div class="calls"><div class= +"calls_in"># spent 5µs within Internals::SvREADONLY which was called +13 times, avg 392ns/call: # 13 times (5µs+0s) by constant::import at <a href="constant-pm-8-line +.html#132">line 132 of constant.pm</a>, avg 392ns/call</div></div>sub + Internals::SvREADONLY; # xsub<br /> </td></tr> <tr><td class="h"><a name="main__CORE_close"></a></td><td></td><td></t +d><td></td><td></td><td class="s"><div class="calls"><div class="call +s_in"># spent 107µs within main::CORE:close which was called 6 times, + avg 18µs/call: # 2 times (76µs+0s) by main::RUNTIME at <a href="nmer_mapping_quant-pl +-1-line.html#151">line 151</a>, avg 38µs/call # 2 times (15µs+0s) by main::RUNTIME at <a href="nmer_mapping_quant-pl +-1-line.html#141">line 141</a>, avg 7µs/call # once (10µs+0s) by main::RUNTIME at <a href="nmer_mapping_quant-pl +-1-line.html#36">line 36</a> # once (6µs+0s) by main::RUNTIME at <a href="nmer_mapping_quant-pl- +1-line.html#53">line 53</a></div></div>sub main::CORE:close; # opcode +<br /> </td></tr> <tr><td class="h"><a name="main__CORE_match"></a></td><td></td><td></t +d><td></td><td></td><td class="s"><div class="calls"><div class="call +s_in"># spent 162ms within main::CORE:match which was called 798209 t +imes, avg 203ns/call: # 766959 times (147ms+0s) by main::RUNTIME at <a href="nmer_mapping_qu +ant-pl-1-line.html#44">line 44</a>, avg 191ns/call # 31250 times (15.8ms+0s) by main::RUNTIME at <a href="nmer_mapping_q +uant-pl-1-line.html#144">line 144</a>, avg 504ns/call</div></div>sub +main::CORE:match; # opcode<br /> </td></tr> <tr><td class="h"><a name="main__CORE_open"></a></td><td></td><td></td +><td></td><td></td><td class="s"><div class="calls"><div class="calls +_in"># spent 146µs within main::CORE:open which was called 6 times, a +vg 24µs/call: # 2 times (65µs+0s) by main::RUNTIME at <a href="nmer_mapping_quant-pl +-1-line.html#142">line 142</a>, avg 32µs/call # 2 times (46µs+0s) by main::RUNTIME at <a href="nmer_mapping_quant-pl +-1-line.html#108">line 108</a>, avg 23µs/call # once (29µs+0s) by main::RUNTIME at <a href="nmer_mapping_quant-pl +-1-line.html#30">line 30</a> # once (6µs+0s) by main::RUNTIME at <a href="nmer_mapping_quant-pl- +1-line.html#39">line 39</a></div></div>sub main::CORE:open; # opcode< +br /> </td></tr> <tr><td class="h"><a name="main__CORE_print"></a></td><td></td><td></t +d><td></td><td></td><td class="s"><div class="calls"><div class="call +s_in"># spent 16.1ms within main::CORE:print which was called 8192 ti +mes, avg 2µs/call: # 8192 times (16.1ms+0s) by main::RUNTIME at <a href="nmer_mapping_qua +nt-pl-1-line.html#148">line 148</a>, avg 2µs/call</div></div>sub main +::CORE:print; # opcode<br /> </td></tr> <tr><td class="h"><a name="main__CORE_readline"></a></td><td></td><td> +</td><td></td><td></td><td class="s"><div class="calls"><div class="c +alls_in"># spent 268ms within main::CORE:readline which was called 77 +5783 times, avg 345ns/call: # 766959 times (242ms+0s) by main::RUNTIME at <a href="nmer_mapping_qu +ant-pl-1-line.html#44">line 44</a>, avg 316ns/call # 8066 times (2.49ms+0s) by main::RUNTIME at <a href="nmer_mapping_q +uant-pl-1-line.html#34">line 34</a>, avg 309ns/call # 756 times (23.0ms+0s) by main::RUNTIME at <a href="nmer_mapping_q +uant-pl-1-line.html#109">line 109</a>, avg 30µs/call # once (13µs+0s) by main::RUNTIME at <a href="nmer_mapping_qua +nt-pl-1-line.html#32">line 32</a> # once (6µs+0s) by main::RUNTIME at <a href="nmer_mapping_quan +t-pl-1-line.html#43">line 43</a></div></div>sub main::CORE:readline; +# opcode<br /> </td></tr> <tr><td class="h"><a name="mro__method_changed_in"></a></td><td></td>< +td></td><td></td><td></td><td class="s"><div class="calls"><div class +="calls_in"># spent 12µs within mro::method_changed_in which was call +ed 13 times, avg 885ns/call: # 13 times (12µs+0s) by constant::import at <a href="constant-pm-8-lin +e.html#147">line 147 of constant.pm</a>, avg 885ns/call</div></div>su +b mro::method_changed_in; # xsub<br /> </td></tr> <tr><td class="h"><a name="utf8__is_utf8"></a></td><td></td><td></td>< +td></td><td></td><td class="s"><div class="calls"><div class="calls_i +n"># spent 8µs within utf8::is_utf8 which was called 13 times, avg 61 +5ns/call: # 13 times (8µs+0s) by constant::import at <a href="constant-pm-8-line +.html#123">line 123 of constant.pm</a>, avg 615ns/call</div></div>sub + utf8::is_utf8; # xsub<br /> </td></tr> </tbody></table></div> <div class="footer">Report produced by the <a href="http://search.cpan.org/dist/Devel-NYTProf/">NYTProf 5 +.05</a> Perl profiler, developed by <a href="http://www.linkedin.com/in/timbunce">Tim Bunce</a> an +d <a href="http://code.nytimes.com">Adam Kaplan</a>. </div> <br /><br /><br /><br /><br /><br /><br /><br /><br /><br />

Replies are listed 'Best First'.
Re: Code Optimization
by kcott (Archbishop) on Sep 10, 2013 at 03:26 UTC

    G'day azheid,

    Cutting your code down to a skeleton where I think your two biggest problems are:

    if(!($permute)){ open(SEQ,'<',$sequence_fname)||die "Cannot open sequence file\n"; while (my @line=split (/\t/,<SEQ>)){ ... } ... { else{ for (...) { ... open(SEQ,'<',$sequence_fname)||die "Cannot open sequence file\ +n";#open the sequence file while (my @line=split (/\t/,<SEQ>)){#for each sequence record +the nmer information ... } close SEQ; open(OUT,">>$out")||die "Cannot open $out\n"; foreach my $key(keys %ktc){ ... print OUT ... ... } close OUT; } }

    You open the SEQ file, read the data from disk, and parse it multiple times: you only need to do this once.

    Also, you open and close the OUT file for appending multiple times: you only need to do this once.

    Without making changes to your coding style, I suspect this, which does only open those files once, would be substantially faster:

    my @seq_data; open(SEQ,'<',$sequence_fname)||die "Cannot open sequence file\n"; while (<SEQ>) { push @seq_data, [split /\t/]; } close SEQ; if(!($permute)){ for (@seq_data) { my @line = @$_; ... } ... { else{ open(OUT,">>$out")||die "Cannot open $out\n"; for (...) { ... for (@seq_data) { my @line = @$_; ... } foreach my $key(keys %ktc){ ... print OUT ... ... } } close OUT; }

    There may be other areas where substantial gains could be made, but I haven't looked beyond the two I/O ones at this point.

    -- Ken

      Originally, I had the program open the file in the manner that you have coded (Once only) however the amount of data in question is massive. This caused the program to use up all the memory of the server I am using, and the OS would randomly kill the program in favor of OS processes. I, perhaps incorrectly, favored the line by line approach because on runtime, the program only uses 0.3% of the RAM, instead of 100%. Perhaps it might be worthwhile to break up the files into chunks and read in maybe 100,000 or so lines at a time?

        In that case, I'd suggest using Tie::File:

        use Tie::File; ... tie my @seq_data, 'Tie::File', $sequence_fname or die "Can't open $sequence_fname: $!"; if(!($permute)){ for (@seq_data) { my @line = split /\t/; ... } ... { else{ open(OUT,">>$out")||die "Cannot open $out\n"; for (...) { ... for (@seq_data) { my @line = split /\t/; ... } foreach my $key(keys %ktc){ ... print OUT ... ... } } close OUT; } untie @seq_data;

        That'll be a little slower because you'll be repeating the split /\t/, but at least you won't have memory issues.

        Also, note the change I made to die. This is not for optimising the efficiency of your code; it's to improve feedback if things go wrong. When you terminate the die message with a newline, you prevent file and line information from being output. Also, "$!" provides addition information about why open failed, see "perlvar: Error Variables". This is a good practice to get into the habit of doing; alternatively, consider using autodie.

        -- Ken

Re: Code Optimization
by Anonymous Monk on Sep 10, 2013 at 01:54 UTC
    For one thing, if you have to do "millions" of permutations, you cannot afford to re-read any file within that loop.
Re: Code Optimization
by Laurent_R (Canon) on Sep 10, 2013 at 08:30 UTC

    Hello

    in addition to what has already been said about doing multiple times things that can be done only once (reading files many times, etc.), and although this is not directly what you are asking for, I would strongly recommend some changes to your coding style. Especially, you've got a lot of repeated code. Just to take an example, this big 30-line while loop:

    while (my @line=split (/\t/,<SEQ>)){#for each sequence record +the nmer information my $ct=0; my $start=$line[1]; if ($start < $line[2]){##Watson Strand # ... } else{##Crick Strand # ... } }

    is coming twice, with identical or almost identical code. Put that in a separate subroutine which you call then twice where you need. This will remove about 60 lines from your main code and make it much shorter and far easier to follow. It will also make it easier to figure out possible performance improvements. And, last but not least, if you have a bug in that code or need to re-factor it, you will need to correct it only once.

      Good suggestion, I will change it to a subroutine format.

Re: Code Optimization
by Laurent_R (Canon) on Sep 09, 2013 at 23:03 UTC

    Yes, please post the code, the code profile without the code itself is very difficult to understand, we need to know at least which function is calling what. And, BTW, take a look at the readmore formatting tag, it makes it possible to make your post appear shorter most of the time.

      Is that better? Thanks for the advice, I will look into that resource.

Re: Code Optimization
by fisher (Priest) on Sep 10, 2013 at 13:09 UTC
    please, make use of the <readmore> tag.
Re: Code Optimization
by Anonymous Monk on Sep 09, 2013 at 23:34 UTC

    and the code I have written for it is very slow.

    Doesn't look that way to me :)

      Well for the test I set the permutations to 2, where I actually need millions of permutations.