Re: finding matches in the same array
by polettix (Vicar) on Apr 13, 2005 at 17:02 UTC
|
If you have to evaluate the average for all sequences, you can build the following hash:
my %occurrences;
push @{$occurrences{$sequence[$_]}}, $numbers[$_]
foreach (0 .. $#sequence);
At this point, each entry in the hash has a "sequence" for key and a reference to an array containing the "numbers" for that sequence as value - computing the average should be pretty easy, e.g.:
my $sum;
$sum += $_ foreach (@{$occurrences{'atcg'}});
my $average = $sum / scalar(@{$occurrences{'atcg'}});
I leave the union of the two snippets as an exercise, just in case it's an homework :)
Update: fixed a typo in first snipped, thanks to Postular Postulant.
Flavio (perl -e "print(scalar(reverse('ti.xittelop@oivalf')))")
Don't fool yourself.
| [reply] [Watch: Dir/Any] [d/l] [select] |
|
let's reduce the loops by calculating the average as we move along:
my %oc;
for (0 .. $#sequence) {
$oc{$sequence[$_]}[0] =
($oc{$sequence[$_]}[0]+$numbers[$_])/(++$oc{$sequence[$_]}[1]);
}
foreach my $seq (keys %oc) {
print "The average for $seq is ".$oc{$seq}[0].$/;
}
| [reply] [Watch: Dir/Any] [d/l] |
|
This does not seem to be an average :) First item is divided by 1, second divided by 2, third by 3... - maybe it's better to make the division at the end:
my %oc;
for (0 .. $#sequence) {
$oc{$sequence[$_]}[0] += $numbers[$_];
++$oc{$sequence[$_]}[1];
}
foreach my $seq (keys %oc) {
print "The average for $seq is ".($oc{$seq}[0] / $oc{$seq}[1]).$/;
}
Flavio (perl -e "print(scalar(reverse('ti.xittelop@oivalf')))")
Don't fool yourself.
| [reply] [Watch: Dir/Any] [d/l] |
Re: finding matches in the same array
by Random_Walk (Prior) on Apr 13, 2005 at 17:03 UTC
|
I guess this is not homework as the source data is presented as two synchronised arrays not a file to read or array of arrays. If it is homework then I recon the OP already did some work to get from a file of data to two arrays.
My output disagrees with the OP's (is actg == acgt ??) and you would be better storing the source data in an array of arrays. I have dumped out the hash of hash I build to collate the data so you can see clearly what is going on. Data::Dumper is fantastic when you are developing any sort of interesting data structure.
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
#e.g.
my @sequence = ('acgt','actg','cggt','cggt');
my @numbers = ('1234','2345','3244','3455');
my %collated;
for (0..$#sequence) {
$collated{$sequence[$_]}{total}+=$numbers[$_];
$collated{$sequence[$_]}{number}++;
}
print Dumper(\%collated);
for (sort keys %collated) {
print "sequence: $_ = ";
print ( $collated{$_}{total} / $collated{$_}{number} ) , "\n";
}
__END__
# my output
$VAR1 = {
'acgt' => {
'number' => 1,
'total' => 1234
},
'cggt' => {
'number' => 2,
'total' => 6699
},
'actg' => {
'number' => 1,
'total' => 2345
}
};
sequence: acgt = 1234
sequence: actg = 2345
sequence: cggt = 3349.5
Cheers, R.
Pereant, qui ante nos nostra dixerunt!
| [reply] [Watch: Dir/Any] [d/l] |
Re: finding matches in the same array
by tlm (Prior) on Apr 13, 2005 at 17:07 UTC
|
use List::Util 'sum';
my %collect;
for my $i ( 0 .. $#@sequences ) {
push @{ $collect{ $sequences[ $i ] } }, $numbers[ $i ];
}
for my $sequence ( keys %collect ) {
my $avg = ( sum @{$collect{$sequence}} )/@{$collect{$sequence}};
printf "sequence: $sequence = %.1f\n", $avg;
}
| [reply] [Watch: Dir/Any] [d/l] |
Re: finding matches in the same array
by RazorbladeBidet (Friar) on Apr 13, 2005 at 17:06 UTC
|
Here's another idea... a little convoluted...there's probably a nicer way to write it
$hash{$_} = [
( $hash{$_}->[0] || 0 ) + $numbers[$i++],
( $hash{$_}->[1] || 0 ) + 1
] foreach @sequence;
print $_, ( $hash{$_}->[0] / $hash{$_}->[1] ) foreach keys %hash;
--------------
"But what of all those sweet words you spoke in private?"
"Oh that's just what we call pillow talk, baby, that's all."
| [reply] [Watch: Dir/Any] [d/l] |
Re: finding matches in the same array
by sh1tn (Priest) on Apr 13, 2005 at 17:28 UTC
|
@seq = ('acgt','actg','cggt','cggt','actg');
@num = ('1234','2345','3244','3455','5230');
Wrong algorithm
#for( 0..$#seq ){
# $struct{$seq[$_]} ||= $num[$_];
# $struct{$seq[$_]} = ($struct{$seq[$_]}+$num[$_]) / 2
#}
| [reply] [Watch: Dir/Any] [d/l] |
|
This gives more weight to the last number (if there are more than 2):
my @sequence = ('acgt','actg','cggt','cggt', 'actg', 'actg');
my @numbers = ('1234','2345','3244','3455', '5230', '100000' );
gives 51893.75
when it should be 34526.3333333333
--------------
"But what of all those sweet words you spoke in private?"
"Oh that's just what we call pillow talk, baby, that's all."
| [reply] [Watch: Dir/Any] [d/l] |
|
You are right, it's utterly different algorithm.
The correct one is as follows:
@seq = ('acgt','actg','cggt','cggt', 'actg', 'actg');
@num = ('1234','2345','3244','3455', '5230', '100000' );
for( 0..$#seq ){
$struct{$seq[$_]}->[0] += $num[$_]
and
$struct{$seq[$_]}->[1]++
}
for( keys %struct ){
print "$_\t", $struct{$_}->[0] / $struct{$_}->[1], $/
}
__END__
STDOUT:
acgt 1234
cggt 3349.5
actg 35858.3333333333
| [reply] [Watch: Dir/Any] [d/l] |
Re: finding matches in the same array
by salva (Canon) on Apr 14, 2005 at 22:19 UTC
|
use two hashes, one for totals an other for counters, them use two loops, one to populate these hashes and other to calculate the averages as total/counter:
my @sequence = ('actg','actg','cggt','cggt');
my @numbers = ('1234','2345','3244','3455');
my (%total, %count, $i);
for ($i=0; $i<@sequence; $i++) {
$total{$sequence[$i]}+=$numbers[$i];
$count{$sequence[$i]}++;
}
for (sort keys %total) {
my $avg=$total{$_}/$count{$_};
print "$_ avg: $avg\n";
}
| [reply] [Watch: Dir/Any] [d/l] |