Dear perlmonkers, Could you point out what's wrong with my code? I want to get for each gene, the maximum value of the associated frequency printed in the 3rd column, but it gives wrong result (e.g. MUC2), see output below
Frequency GENE
20 SEMA4G
80 CTBP2
80 CTBP2
80 CTBP2
100 SPRN
40 SVIL
80 TIMM23
80 MINPP1
80 BTAF1
20 MUC2
60 MUC2
80 MUC2
100 MUC2
60 OR10G9
60 C11orf91
80 OR4S1
40 OR4C3
40 OR4C3
20 OR4C3
40 OR4C45
60 OR4C45
$file1 = $ARGV[0];
%hashfreq = ();
$freq = 0;
$geneold = "NA";
open(INPUTR,"<$file1") || die "Can't open \$file1 for reading.\n";
while($line=<INPUTR>){
chomp $line;
@toks = split(/\t/, $line);
$gene = $toks[1];
if ($gene =~ /^$geneold$/){
if ($toks[0] >= $numold){
$freq = $toks[0];
print $toks[0]."\t".$numold."\t".$freq."\t".$gene."\n";
} else {
$freq = $numold;
}
$hashfreq{$toks[0]} = $freq;
} else {
$hashfreq{$toks[1]} = $toks[0];
}
$numold = $toks[0];
$geneold = $toks[1];
}
close(INPUTR);
open(OUTD1,">output.txt");
open(INPUTR,"<$file1") || die "Can't open \$file1 for reading.\n";
while($line=<INPUTR>){
chomp $line;
@toks = split(/\t/, $line);
$idgene = $toks[1];
if (exists $hashfreq{$idgene}){
print OUTD1 $line."\t".$hashfreq{$idgene}."\n";
print $line."\t".$hashfreq{$idgene}."\n";
}
}
Frequency GENE MAX
20 SEMA4G 20
80 CTBP2 80
80 CTBP2 80
80 CTBP2 80
100 SPRN 100
40 SVIL 40
80 TIMM23 80
80 MINPP1 80
80 BTAF1 80
20 MUC2 20
60 MUC2 20
80 MUC2 20
100 MUC2 20
60 OR10G9 60
60 C11orf91 60
80 OR4S1 80
40 OR4C3 40
40 OR4C3 40
20 OR4C3 40
40 OR4C45 40
60 OR4C45 40