How I can decide which hash is the least differing from other hashes?
If I understand you correctly, you want to measure the 'fit' of each ordered set of hash values against every other set of hash values and accumulate their scores and sort them via that criteria.
For this purpose a (A,B,C,D) -v- (A,B,D,C) scores 2, and (A,B,C,D) -v- (B,A,D,C) scores 0.
This is an O(n^2) problem. Add to this the cost of comparing two arrays of strings and counting the matches and you have a slow and expensive process.
Those little diagrams above gave me an idea.
If you XOR a number with itself, you get zero. Perl (uniquely?) can do bitwise operations on strings. If you XOR 2 strings and they are the same, you end up with a string of null bytes. A slow method of comparing two string, and no good at all for comparing arrays of strings, even if you concatenated the arrays into a single string unless all the string were the same length. However, you said that the set of values in each hash is the same, its only the ordering that distinguishes them. Therefore it is possible to represent each set of values by a string with each value represented by a single character (provided that there are less than 255 unique values in each hash).
Once you have the sets represented this way, deriving the 'fit' of any pair of hashes comes down to XORing 2 substitute strings and counting the number of null bytes. Doing this for all the pairs and accumulating the scores determines a fit value of each hash relative to every other.
You then need to order these fits whilst retaining the index for which hash they belong to. You can then use the print them out in order of relative fit.
The code below does this for 50 hashes of 9 values each in a few seconds. You don't say how many hashes or how large your sets, but it may complete in a reasonable time if your numbers aren't too extreme.
The whole process only takes 6 (fairly complex) lines:)
#! perl -slw
use strict;
use Inline::Files; #! Not necessary for the algorithm.
#! Just convenience in my test program
#! If anyone else gets the following warnings from the
#! above module, please let me know.
=pod Comment only
Use of uninitialized value in length at e:/Perl/site/lib/Inline/Files/
+Virtual.pm line 89, <FILE> chunk 1.
Use of uninitialized value in transliteration (tr///) at e:/Perl/site/
+lib/Inline/Files/Virtual.pm line 167, <FILE> chunk 1.
=cut
#! Some silly hash keys for testing purposes.
my @keys = qw(one two three four five six seven eight nine);
#! Build an array of hashes from the __DATA__ section.
#! The array above as keys, the words from each line as values
my @AoH;
push @AoH, do{ my $i=0; +{ map{ $keys[$i++] => $_ } split } } while <D
+ATA>;
#! Build a lookup table from one (any) hash's values.
my %lookup = do{ my $i=1; map{ $_ => $i++ } values %{$AoH[0]} };
#! Build an array of substitute strings, mapping each value to a singl
+e char.
#! I've used an offset of 65 to make things readable when testing.
#! Start at chr(1) if your hashes have more than 222 values.
#! You might need to specify [no utf-8] if you use values above 127? I
+'m not sure.
my @subs;
push @subs, join'', map{ chr 65+$lookup{$_} } values %{$_} for @AoH;
#! XOR every substitute with every substitute, count the nuls in the r
+esult
#! Accumulate them in an array in the same order as the subs.
my @fits = ( (0)x@subs );
for my $i (0 .. $#subs) {
$fits[$i] += scalar ($subs[$i] ^ $_) =~ tr/\0// for @subs;
}
#! The sort is a slightly funky (I wonder where I got that word from:)
+ version of the ST sort.
#! It feeds in and sorts on the values of the array and feeds out the
+indices of the sorted values.
my @sorted =
map{ $_->[1] }
sort{ $b->[0] <=> $a->[0] } #! Descending sort, higher is better f
+it.
map{ [ $fits[$_] => $_ ] } 0 .. $#fits;
#! Output the values sets ordered by and prefixed with their relative
+'fit'.
print OUTPUT "$fits[$_] : @{[ values %{$AoH[$_]} ]}\n" for @sorted;
test data
Results
I hope the results mean something to you, cos I can't see the pattern:)
Examine what is said, not who speaks. |