Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

Efficient random hash stuff

by japhy (Canon)
on Nov 17, 2000 at 00:57 UTC ( [id://42074]=perlmeditation: print w/replies, xml ) Need Help??

I'm trying to get around the possible slowness of:
# ($k,$v) = rhe_1(%hash); sub rhe_1 (\%) { my $key = (keys %{ $_[0] })[rand keys %{ $_[0] }]; return ($key, delete $_[0]{$key}); }
Now, this is obviously inefficient, because it must rebuild a list of keys each time, which can be bad for large hashes (like a hash representing a dictionary). Now, one possible solution is an array and a hash working together:
# @keys = keys %hash; # done ONCE # ($k,$v) = rhe_2(%hash,@keys); sub rhe_2 (\%\@) { my $key; do { $key = rand @keys } until exists $_[0]{$_[1][$key]}; return ($_[0]{$_[1][$key]}, delete $_[0]{$_[1][$key]}); }
That trades the slowness of splice() for the possible slowness of having to reselect a random index. However, this requires an extra N amount of memory allocated.

What can I do? Is there any work on making such an operation efficient?

Here's a possible solution, which needs extra space, but not much (on average), and uses an array instead of a hash:
# %seen = (); # done ONCE # @array = ([k,v], [k,v], ...); # ($k,$v) = rhe_3(@array,%seen); sub rhe_3 (\@\%) { my $idx; do { $idx = rand @{ $_[0] } } while $_[1]{int $idx}++; return @{ $_[0][$idx] }; }
I'd be glad to hear any ideas.

$monks{japhy}++ while $posting;

Replies are listed 'Best First'.
(tye)Re: Efficient random hash stuff
by tye (Sage) on Nov 17, 2000 at 01:18 UTC

    Ah, no need to splice.

    # @keys = keys %hash; # done ONCE # ($k,$v) = rhe(%hash,@keys); sub rhe (\%\@) { my( $hv, $av )= @_; my $idx= rand @$av; my $key= $av->[$idx]; my $val= delete $hv->{$key}; my $last= pop @$av; $av->[$idx]= $last if $idx < @$av; return( $key, $val ); }

    As for saving space, I don't see you really making use of the hash here so you could do:

    @keys = keys %hash; @vals = values %hash; undef %hash;
    If the keys are long, then you could also probably save a bit of space by storing reference to the keys in the array instead of the key values themself (or maybe not).

            - tye (but my friends call me "Tye")
      Wow, that's damn nice. And we could use my just-an-array idea from above:
      sub rand_element (\@); @array = ([1,2], [3,4], [5,6]); # etc... ($k,$v) = rand_element(@array); sub rand_element (\@) { my $aref = shift; my $idx = rand @$aref; my ($k,$v) = @{ $aref->[$idx] }; my $last = pop @$aref; $aref->[$idx] = $last if $idx < @$aref; return ($k,$v); }
      Thank you muchly, tye. I need to ++ YOU, not your node...

      $monks{japhy}++ while $posting;

        Note that ([1,2],[3,4],[5,6],...) consumes quite a bit more memory (no, I haven't tested this) than ( [1,3,5,...], [2,4,6,...] ). I agree that this is less elegant, but if memory usage is a primary concern, then the elegance loss might be worth it.

                - tye (but my friends call me "Tye")
Re: Efficient random hash stuff
by Dominus (Parson) on Nov 17, 2000 at 01:24 UTC
    You should check out Dan Schmidt's article Building a Better Hash. He is trying to solve exactly the same problem. He comes up with a Perl module that implements a data structure with fast indexed access for search, insert, and delete (like a hash) but also fast random key selection.

      I've seen his article in TPJ -- I don't have it on me, and I'm looking for a random-access-only data structure. tye helped come up with it. It's swimmingly cool.

      $monks{japhy}++ while $posting;
Re: Efficient random hash stuff
by Fastolfe (Vicar) on Nov 17, 2000 at 01:25 UTC
    My first instinct is to use the %hash/@array duality property and to select a key randomly by finding a modulus 2 index out of that array, but I don't know of any way of de-referencing a hashref in such a way that I can use that property. It'd be easy to do something like that if we could turn it into a real hash first, but that makes it less efficient than it is now.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlmeditation [id://42074]
Approved by root
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others goofing around in the Monastery: (4)
As of 2024-03-28 14:35 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found