Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Separating hashed data into subsets

by monkini (Initiate)
on Nov 11, 2013 at 14:52 UTC ( [id://1062003]=perlquestion: print w/replies, xml ) Need Help??

monkini has asked for the wisdom of the Perl Monks concerning the following question:

I have a pairwise distances between objects stored as double hash (the keys 1 and 2 are stored in alphabetical order):

$distances{$key1}{$key2} = $value;

for example:

obj1 obj2 => 0.95 obj2 obj3 => 0.76 obj1 obj4 => 0.80 ...

I have initiated an array of hashes, so that the object names are keys and they point to the name of subset I want to group them into (five subsets in total), so that:

@subsets = ( { obj1 => "C1", obj2 => "C2", obj3 => "C3", obj4 => "C4", obj5 => "C5", } );

I also store already used object names in a separate hash:

%used = ( { obj1 => "C1", obj2 => "C2", obj3 => "C3", obj4 => "C4", obj5 => "C5", } );

Now I want to look up the highest pairwise distance for each object in $subsets[0] (value from %distances), and append the 'partner's' key to the $subsets[ 1], with the same value (C1-C5) (For instance, if the obj1, which points to C1 has the highest pairwise distance with obj6, then I want to store obj6 as key and C1 as value). Also, I need to add each new object to the %used, as to make sure if the pair has not been used before. Alternatively, I could remove the used up pairs from %distances.

for ($i = 0; $i < 50; $i++) { #loop over array of hashes (each subset +will have 50 keys) foreach my $key (keys $subsets[$i]) { #5 keys at each level my $max = 0; my @max; for my $key1 (keys %distances) { if $key !exists (keys %used) { $max = max (values %{$distances{$key1}}); push @max, $max; # somehow retrieve the 2nd key } } my $maxmax = max @max; # $subsets[$i+1]{2nd_key_with_max_val} = "e.g. C1" # push %used with key and value @max = []; } }

...but I clearly have problems making it work. Any ideas?

Replies are listed 'Best First'.
Re: Separating hashed data into subsets
by roboticus (Chancellor) on Nov 11, 2013 at 15:34 UTC

    monkini;

    I'm not sure I can tell what you're asking for, but if I'm guessing correctly, your problem is that you're trying to find the subkey associated with the first key and the largest value. If so, I'm thinking this will do what you want:

    #!/usr/bin/perl use strict; use warnings; my %h = ( a=>{ a=>1, b=>2, c=>3, d=>4 }, b=>{ h=>5, i=>6, j=>5 }, c=>{ r=>5, s=>5, t=>5 }, e=>{ r=>5, s=>5, t=>4 }, + ); for my $k (sort keys %h) { my @t = kv_for_largest_v($h{$k}); print "Key $k: $t[0], $t[1]\n"; } sub kv_for_largest_v { my $hr = shift; return undef unless keys %$hr; my ($k,$v) = each %$hr; while (my ($k2,$v2) = each %$hr) { ($k,$v) = ($k2,$v2) if $v2>$v; } return $k,$v; }

    This returns only a single key/value pair which you can use in your code kind of like this:

    my ($maxkey, $maxmax) = kv_for_largest_v($distances{$key1};

    If you want to iterate through multiples in case there are several identical maximum distances, you can return an array reference instead, and push all the "winners" to the list something like so:

    sub kv_for_largest_v { my $hr = shift; return undef unless keys %$hr; my ($k,$v) = each %$hr; my @rv = ([ $k, $v ]); while (my ($k2,$v2) = each %$hr) { if ($v2 > $rv->[0][1]) { # New maximum value, reset the list $rv = ([ $k2, $v2 ]); } elsif ($v2 == $rv->[0][1]) { # Another copy of the max distance, add to the list push @rv, [ $k2, $v2 ]; } } return \@rv; }

    Apologies if I didn't interpret your question correctly. As I see it, the problems I had with your question are primarily that there was no clear question. The title "Separating hashed data into subsets" initially misled me into thinking you were needing some algorithmic help, so I put my thinking into "problem solving" mode. The setup with the three data structures started me thinking you were having a bookkeeping problem where you may have been having difficulty tracking your work through the algorithm, so then I switched thought modes into "analysis" mode so I could try to find where you may be losing track of the data. But about 10 minutes later, the comment "# somehow retrieve the 2nd key", triggered the thought: Oh, he found a hash value he's interested in, but doesn't have the key it came with, which appears to be the desired question.

    If that *is* the desired question, then you should've pruned a little more application-specific detail from your question. I was bored this morning, though, so hopefully I found what you were looking for. If not, update your post to let us know in a bit more detail the *real* question, and I'm sure someone can help out.

    As a first question to PM, it's not bad at all, but could've been a little more direct. For me, the easier questions are when someone states--in as few words as possible--the exact problem/question, and then provides all the supporting details. That way, I can switch my thought processes into the best mode for the job more easily.

    ...roboticus

    When your only tool is a hammer, all problems look like your thumb.

Re: Separating hashed data into subsets
by hdb (Monsignor) on Nov 11, 2013 at 15:35 UTC

    I do not really understand what you need %used for, but I would calculate the maximum distance like this:

    for (my $i = 0; $i < 50; $i++) { #loop over array of hashes (each subs +et will have 50 keys) my $max = 0; # assuming distances are positive... foreach my $key1 (keys $subsets[$i]) { #5 keys at each level foreach my $key2 (keys $subsets[$i]) { #5 keys at each level next unless exists $distance{$key1}{$key2}; my $dist = $distance{$key1}{$key2}; $max = $dist if $dist > $max; } } # do something with $max... }

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1062003]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others exploiting the Monastery: (4)
As of 2024-04-25 21:27 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found