Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Re^3: RFC: Is there a solution to the flaw in my hash mechanism? (And are there any others?)

by RichardK (Parson)
on May 30, 2015 at 13:33 UTC ( [id://1128400]=note: print w/replies, xml ) Need Help??


in reply to Re^2: RFC: Is there a solution to the flaw in my hash mechanism? (And are there any others?)
in thread RFC: Is there a solution to the flaw in my hash mechanism? (And are there any others?)

Yes you're right.

So your code does something like this :-

int r = rand(); int n = r % p; while ( a[n] != 0 ) { n = (n + n) % p; } a[n] = r;

so you get stuck when n =0. But if you add a value to n that's relatively prime to P then it should work.

for example,

say (($_ + 5) % 11 ) for (0..10); 5 6 7 8 9 10 0 1 2 3 4
  • Comment on Re^3: RFC: Is there a solution to the flaw in my hash mechanism? (And are there any others?)
  • Select or Download Code

Replies are listed 'Best First'.
Re^4: RFC: Is there a solution to the flaw in my hash mechanism? (And are there any others?)
by BrowserUk (Patriarch) on May 30, 2015 at 13:52 UTC
    if you add a value to n that's relatively prime to P then it should work.

    As with my initial thought of adding 1, all that does is move the problem, not fix it, whatever relatively prime constant you use:

    for( my $i = 0; $i < 17; ++$i ) { my $j = $i; printf "%2u: %s\n", $i, join' ', map{ sprintf "%2u", $j = ( $j + $ +i + 13 ) % 17 } 0 .. 16; };; # + ##^^## 0: 13 9 5 1 14 10 6 2 15 11 7 3 16 12 8 4 0 1: 15 12 9 6 3 0 14 11 8 5 2 16 13 10 7 4 1 2: 0 15 13 11 9 7 5 3 1 16 14 12 10 8 6 4 2 3: 2 1 0 16 15 14 13 12 11 10 9 8 7 6 5 4 3 4: 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 <<<<<<<<< here 5: 6 7 8 9 10 11 12 13 14 15 16 0 1 2 3 4 5 6: 8 10 12 14 16 1 3 5 7 9 11 13 15 0 2 4 6 7: 10 13 16 2 5 8 11 14 0 3 6 9 12 15 1 4 7 8: 12 16 3 7 11 15 2 6 10 14 1 5 9 13 0 4 8 9: 14 2 7 12 0 5 10 15 3 8 13 1 6 11 16 4 9 10: 16 5 11 0 6 12 1 7 13 2 8 14 3 9 15 4 10 11: 1 8 15 5 12 2 9 16 6 13 3 10 0 7 14 4 11 12: 3 11 2 10 1 9 0 8 16 7 15 6 14 5 13 4 12 13: 5 14 6 15 7 16 8 0 9 1 10 2 11 3 12 4 13 14: 7 0 10 3 13 6 16 9 2 12 5 15 8 1 11 4 14 15: 9 3 14 8 2 13 7 1 12 6 0 11 5 16 10 4 15 16: 11 6 1 13 8 3 15 10 5 0 12 7 2 14 9 4 16 for( my $i=0; $i < 17; ++$i ) { my $j = $i; printf "%2u: %s\n", $i, join' ', map{ sprintf "%2u", $j = ( $j + $ +i + 11 ) % 17 } 0 .. 16; };; ## + ##^^## 0: 11 5 16 10 4 15 9 3 14 8 2 13 7 1 12 6 0 1: 13 8 3 15 10 5 0 12 7 2 14 9 4 16 11 6 1 2: 15 11 7 3 16 12 8 4 0 13 9 5 1 14 10 6 2 3: 0 14 11 8 5 2 16 13 10 7 4 1 15 12 9 6 3 4: 2 0 15 13 11 9 7 5 3 1 16 14 12 10 8 6 4 5: 4 3 2 1 0 16 15 14 13 12 11 10 9 8 7 6 5 6: 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 <<<<<<<<<<<<< h +ere 7: 8 9 10 11 12 13 14 15 16 0 1 2 3 4 5 6 7 8: 10 12 14 16 1 3 5 7 9 11 13 15 0 2 4 6 8 9: 12 15 1 4 7 10 13 16 2 5 8 11 14 0 3 6 9 10: 14 1 5 9 13 0 4 8 12 16 3 7 11 15 2 6 10 11: 16 4 9 14 2 7 12 0 5 10 15 3 8 13 1 6 11 12: 1 7 13 2 8 14 3 9 15 4 10 16 5 11 0 6 12 13: 3 10 0 7 14 4 11 1 8 15 5 12 2 9 16 6 13 14: 5 13 4 12 3 11 2 10 1 9 0 8 16 7 15 6 14 15: 7 16 8 0 9 1 10 2 11 3 12 4 13 5 14 6 15 16: 9 2 12 5 15 8 1 11 4 14 7 0 10 3 13 6 16 for( my $i=0; $i<17; ++$i ) { my $j = $i; printf "%2u: %s\n", $i, join' ', map{ sprintf "%2u", $j = ( $j + $ +i + 7 ) % 17 } 0 .. 16; };; ## + ##^## 0: 7 14 4 11 1 8 15 5 12 2 9 16 6 13 3 10 0 1: 9 0 8 16 7 15 6 14 5 13 4 12 3 11 2 10 1 2: 11 3 12 4 13 5 14 6 15 7 16 8 0 9 1 10 2 3: 13 6 16 9 2 12 5 15 8 1 11 4 14 7 0 10 3 4: 15 9 3 14 8 2 13 7 1 12 6 0 11 5 16 10 4 5: 0 12 7 2 14 9 4 16 11 6 1 13 8 3 15 10 5 6: 2 15 11 7 3 16 12 8 4 0 13 9 5 1 14 10 6 7: 4 1 15 12 9 6 3 0 14 11 8 5 2 16 13 10 7 8: 6 4 2 0 15 13 11 9 7 5 3 1 16 14 12 10 8 9: 8 7 6 5 4 3 2 1 0 16 15 14 13 12 11 10 9 10: 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 <<<<<<<<<<<<< +< Here 11: 12 13 14 15 16 0 1 2 3 4 5 6 7 8 9 10 11 12: 14 16 1 3 5 7 9 11 13 15 0 2 4 6 8 10 12 13: 16 2 5 8 11 14 0 3 6 9 12 15 1 4 7 10 13 14: 1 5 9 13 0 4 8 12 16 3 7 11 15 2 6 10 14 15: 3 8 13 1 6 11 16 4 9 14 2 7 12 0 5 10 15 16: 5 11 0 6 12 1 7 13 2 8 14 3 9 15 4 10 16 for( my $i = 0; $i < 17; ++$i ) { my $j = $i; printf "%2u: %s\n", $i, join' ', map{ sprintf "%2u", $j = ( $j + $ +i + 3 ) % 17 } 0 .. 16; };; ## + ##^## 0: 3 6 9 12 15 1 4 7 10 13 16 2 5 8 11 14 0 1: 5 9 13 0 4 8 12 16 3 7 11 15 2 6 10 14 1 2: 7 12 0 5 10 15 3 8 13 1 6 11 16 4 9 14 2 3: 9 15 4 10 16 5 11 0 6 12 1 7 13 2 8 14 3 4: 11 1 8 15 5 12 2 9 16 6 13 3 10 0 7 14 4 5: 13 4 12 3 11 2 10 1 9 0 8 16 7 15 6 14 5 6: 15 7 16 8 0 9 1 10 2 11 3 12 4 13 5 14 6 7: 0 10 3 13 6 16 9 2 12 5 15 8 1 11 4 14 7 8: 2 13 7 1 12 6 0 11 5 16 10 4 15 9 3 14 8 9: 4 16 11 6 1 13 8 3 15 10 5 0 12 7 2 14 9 10: 6 2 15 11 7 3 16 12 8 4 0 13 9 5 1 14 10 11: 8 5 2 16 13 10 7 4 1 15 12 9 6 3 0 14 11 12: 10 8 6 4 2 0 15 13 11 9 7 5 3 1 16 14 12 13: 12 11 10 9 8 7 6 5 4 3 2 1 0 16 15 14 13 14: 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 <<<<<<<<<<<<< +<< Here 15: 16 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16: 1 3 5 7 9 11 13 15 0 2 4 6 8 10 12 14 16

    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority". I'm with torvalds on this
    In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked

      What about this instead?

      for( my $i = 0; $i < 17; ++$i ) { my $j = $i; printf "%2u: %s\n", $i, join' ', map{ sprintf "%2u", $j = ( $j + 13 +) % 17 } 0 .. 16; }; 0: 13 9 5 1 14 10 6 2 15 11 7 3 16 12 8 4 0 1: 14 10 6 2 15 11 7 3 16 12 8 4 0 13 9 5 1 2: 15 11 7 3 16 12 8 4 0 13 9 5 1 14 10 6 2 3: 16 12 8 4 0 13 9 5 1 14 10 6 2 15 11 7 3 4: 0 13 9 5 1 14 10 6 2 15 11 7 3 16 12 8 4 5: 1 14 10 6 2 15 11 7 3 16 12 8 4 0 13 9 5 6: 2 15 11 7 3 16 12 8 4 0 13 9 5 1 14 10 6 7: 3 16 12 8 4 0 13 9 5 1 14 10 6 2 15 11 7 8: 4 0 13 9 5 1 14 10 6 2 15 11 7 3 16 12 8 9: 5 1 14 10 6 2 15 11 7 3 16 12 8 4 0 13 9 10: 6 2 15 11 7 3 16 12 8 4 0 13 9 5 1 14 10 11: 7 3 16 12 8 4 0 13 9 5 1 14 10 6 2 15 11 12: 8 4 0 13 9 5 1 14 10 6 2 15 11 7 3 16 12 13: 9 5 1 14 10 6 2 15 11 7 3 16 12 8 4 0 13 14: 10 6 2 15 11 7 3 16 12 8 4 0 13 9 5 1 14 15: 11 7 3 16 12 8 4 0 13 9 5 1 14 10 6 2 15 16: 12 8 4 0 13 9 5 1 14 10 6 2 15 11 7 3 16

        The problem with that is you lose the "de-clustering" effect. (Note: the regularity of the columns in your table; and the apparent "randomness" of mine.)

        That is, because the step size has become a constant -- albeit with a different offset for each input value $i -- consecutive inputs tend to cluster rather than getting evenly distributed.

        And research shows that the downside of clustering is an increase in retries. They not only start earlier (when the fill ratio is lower), the clusters tend to mean more retries before you find an empty slot.

        Of course, that only affects applications that tend to store consecutive inputs; and in the normal way of things, the use of a good hashing function can be used to negate it.

        But for my application, as the keys are themselves numbers -- and the priority is lookup performance -- it makes sense to avoid the cost of a hashing function and use the numbers (% table size) directly.

        For my application, the possibility for consecutive inputs is pretty much indeterminable; being a function of the statistical distribution of the DNA being processed; and its length -- quite literally a "how long is a (piece of) string" problem; but the possibility for large numbers of consecutive numbers being stored -- although they may be generated out of sequence, their effect is the same -- is sufficiently high that if there is an alternative that retains that de-clustering effect, I'd rather use it.

        I'm currently considering a special case & different code path for when the first probe calculation is 0. For example, use a simple linear probe (+1) for that case only. Or maybe +prime/2 or something.

        It does add a conditional test at the heart of both the insertion and lookup code; so I'd have to run some large scale simulations to see if the cost of the test was offset by the de-clustering effect.

        I've only read that the latter is beneficial, so I might be chasing a red-herring here. There seem to be several "good practices" regarding hashes that you can trace back to one basic source on the web; but can't find any supporting evidence for.

        Eg. If you search for the phrase "Item (3) has, allegedly, been shown to yield especially good results in practice.", you'll find many reiterations of the same information -- its hard to determine which was the original -- but nowhere can I find who alleged it; when; where; and based upon what evidence.".

        But that's the 'net in a nutshell. Do a search for "recipe", pick one at random; pick out a fairly unique phrase from that recipe and search for that and you'll often find a couple of hundred or more people claiming the same exact recipe as their own.


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority". I'm with torvalds on this
        In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1128400]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others sharing their wisdom with the Monastery: (4)
As of 2024-04-24 20:37 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found