replicant4 has asked for the wisdom of the Perl Monks concerning the following question:
Dear Perl Monks,
I have a serious problem and I would really appreciate and help that you can give me
I have created two arrays (@site1 @site2) each of which contains string variables (called $site1, and $site2).
an example of a $site is: $site = 'AATKKM'
I need a script that will compare $site1 of @site1 with every $site variable in @site2. It should give me for example the following pairs (if $site2 = 'GGGGG')
AG AG TG KG KG MG
these will be my pairwise combinations
for every combination that i find I need to have a counter that will increment by 1
e.g AG > count_AG++;
I don't expect anyone to give me a full working script but any ideas on the overall structure that it should have would be more than helpfull.
Thanks in advance for your help
Re: Comparing two arrays and counting pairwise comparisons
by ikegami (Patriarch) on Oct 18, 2004 at 23:11 UTC

I'm not sure if I understood correctly. Does this do the trick? I made the slight modification of putting the count of "AG"s (for example) in $count{AG}, not $count_AG.
use strict;
use warnings;
use Algorithm::Loops qw( MapCar );
my @site1 = qw(
AATKKM
123456
);
my @site2 = qw(
GGGGGG
!@#$%^
);
my $site1;
my $site2;
my @pairs;
my %counts;
foreach $site1 (@site1) {
my @site1_parts = split(//, $site1);
foreach $site2 (@site2) {
my @site2_parts = split(//, $site2);
MapCar { $counts{$_[0].$_[1]}++ }
\@site1_parts, \@site2_parts;
}
}
print($_, ': ', $counts{$_}, $/) foreach (sort keys %counts);
__END__
output
======
1!: 1
1G: 1
2@: 1
2G: 1
3#: 1
3G: 1
4$: 1
4G: 1
5%: 1
5G: 1
6G: 1
6^: 1
A!: 1
A@: 1
AG: 2
K$: 1
K%: 1
KG: 2
MG: 1
M^: 1
T#: 1
TG: 1
 [reply] [d/l] [select] 
Re: Comparing two arrays and counting pairwise comparisons
by BrowserUk (Patriarch) on Oct 19, 2004 at 00:12 UTC

#! perl slw
use strict;
use List::Util qw[ min ];
use Data::Dumper;
my @site1 = qw[ abcde cdefg efghi ];
my @site2 = qw[ zyxwv xwvut vutrs ];
my %counts;
for my $site1 ( @site1 ){
for my $site2 ( @site2 ) {
$counts{ substr( $site1, $_, 1 ) . substr( $site2, $_, 1 ) }++
+
for 0 .. min( length( $site1 )1, length( $site2 )1 );
}
}
print Dumper \%counts;
__END__
P:\test>400340.pl
$VAR1 = {
'bu' => 1,
'cv' => 2,
'fu' => 2,
'av' => 1,
'hw' => 1,
'ax' => 1,
'du' => 2,
'gv' => 2,
'es' => 1,
'is' => 1,
'ct' => 1,
'iv' => 1,
'gx' => 1,
'fr' => 1,
'cx' => 2,
'dy' => 1,
'az' => 1,
'ev' => 3,
'ez' => 1,
'et' => 2,
'it' => 1,
'dw' => 2,
'hu' => 1,
'dr' => 1,
'ex' => 2,
'fy' => 1,
'cz' => 1,
'by' => 1,
'fw' => 2,
'bw' => 1,
'gs' => 1,
'hr' => 1,
'gt' => 2
};
Examine what is said, not who speaks.
"Efficiency is intelligent laziness." David Dunham
"Think for yourself!"  Abigail
"Memory, processor, disk in that order on the hardware side. Algorithm, algorithm, algorithm on the code side."  tachyon
 [reply] [d/l] 

 [reply] 

 [reply] [d/l] 
Re: Comparing two arrays and counting pairwise comparisons
by borisz (Canon) on Oct 18, 2004 at 23:13 UTC

I read the description twice and did not understand, what you want to do.
 [reply] 
Re: Comparing two arrays and counting pairwise comparisons
by tachyon (Chancellor) on Oct 19, 2004 at 00:19 UTC

Your description is ambiguous. With the second string all the same char we have to infer either you want
 char1 str1 against char1 str2, char2 str1 against char2 str2 or you want all the permutaions ie
 char1 str1 against char(1..n) str2, char2 str1 against char(1..n) str2....
I am assuming you want all the permutations. There are two general approaches. Brute force and the more clever approach. Brute force won't scale well as it is O(n^2). The clever approach is 0(n) plus an O(n^2) whisker that represents the number of possible tokens. As you can see both approaches yield the same results. Assuming those are the results you wanted ;)
use Data::Dumper;
my $site1 = 'AATKKM' x 20;
my $site2 = 'GGGGGA' x 20;
# brute force
my (%hash, $loops);
for my $base1( split //, $site1 ) {
for my $base2( split //, $site2 ) {
$hash{"${base1}_$base2"}++;
$loops++;
}
}
# precount incidence of tokens
my (%s1, %s2, %hash_e, $loops_e);
do{ $s1{$_}++; $loops_e++ } for split //, $site1;
do{ $s2{$_}++; $loops_e++ } for split //, $site2;
# still loop within loop but the are now far fewer loops to do
# as we only do one per token pair and calculate the total pairs
# from our precount data
for my $base1( keys %s1 ) {
for my $base2( keys %s2 ) {
$hash_e{"${base1}_$base2"} = $s1{$base1}*$s2{$base2};
$loops_e++;
}
}
print "Brute force loops $loops\nEfficeient Loops $loops_e\n\n";
print Data::Dumper>Dump([\%hash, \%s1, \%s2, \%hash_e], [qw( hash s1
+s2 hash_e)] );
__DATA__
Brute force loops 14400
Efficeient Loops 248
$hash = {
'T_G' => '2000',
'A_A' => '800',
'T_A' => '400',
'M_G' => '2000',
'M_A' => '400',
'K_G' => '4000',
'K_A' => '800',
'A_G' => '4000'
};
$s1 = {
'A' => '40',
'K' => '40',
'T' => '20',
'M' => '20'
};
$s2 = {
'G' => '100',
'A' => '20'
};
$hash_e = {
'T_G' => '2000',
'A_A' => '800',
'T_A' => '400',
'M_G' => '2000',
'M_A' => '400',
'K_G' => '4000',
'K_A' => '800',
'A_G' => '4000'
};
 [reply] [d/l] 

(From memory!) Your algorithm is O(N^2) but mine is O(N)
True enough, but which N?
I don't think your code comes close to answering the question. Your treating all the strings in each array as a single concatenated string. I'm not, and I don't believe that's what the OP wants.
Using your algorithm on my test data produces this:
Which is solving a totally different problem to the one I belive the OP wants to solve.
Examine what is said, not who speaks.
"Efficiency is intelligent laziness." David Dunham
"Think for yourself!"  Abigail
"Memory, processor, disk in that order on the hardware side. Algorithm, algorithm, algorithm on the code side."  tachyon
 [reply] [d/l] 

