Re^7: RFC: Is there a solution to the flaw in my hash mechanism? (And are there any others?)

Sure, I think this is always a problem of hashing algorithms , picking the right hash is somewhere between hard and impossible. I total agree that the lack of evidence for many claims is appalling. It seems that many people are incapable or unwilling to perform even simplest of statistical analysis on their results.

But as I said in an earlier thread, I'd use a b-tree for this. You have very specific requirements so will have to implement custom code to do what you need. The code for general purpose b-trees can be quite complex, but you can eliminate everything you don't need. Of course, I don't know the exact details of the problem, so it's only a suggestion of something else to think about ;)

Comment on Re^7: RFC: Is there a solution to the flaw in my hash mechanism? (And are there any others?)

Replies are listed 'Best First'.

Re^8: RFC: Is there a solution to the flaw in my hash mechanism? (And are there any others?)
by BrowserUk (Patriarch) on May 31, 2015 at 10:25 UTC

But as I said in an earlier thread, I'd use a b-tree for this ... so it's only a suggestion of something else to think about ;)

Despite my earlier somewhat flippant remark about my history with B-trees, I did give them some thought at that time. This is the logic I used to exclude them for this.

The main advantages B-trees have over binary trees are:

Reduction in the number of pointers needed.
But they still require quite a lot of pointers; which at 8-bytes per is a significant cost of memory.
The "blocking" of similar values.
With on-disk DBs, this leads to a significant saving in expensive reads from disk; but not applicable here.
The increased locality of reference would play to the strengths of cpu caching; but it would require very careful design and tuning to get the most out of that. (Think Judy arrays complexity.)

But in the final analysis; a B-tree is O(logN) for lookups; and any advantages B-tree might have over other trees is only titivating at the edges; some reduction in memory requirement; some potential for reduced cache misses.

My (crude & flawed) test implementation of this hash has demonstrated O(1.021) average lookup probes for 175 million in a 200,000,033 array (87% fill); and both the insert & lookup code is trivial; and very fast.

With a b-tree, the average lookup for 175 million requires 8.25 probes; and the structure would require 3 times the space.

With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.

"Science is about questioning the status quo. Questioning authority". I'm with torvalds on this

In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked

[reply]


No such thing as a small change
	PerlMonks