Re: Small Hash a Gateway to Large Hash?

You shouldn't need to care about collisions, if the keys are random and not pathologically designed in a way to enforce collisions.

A hash normally just grows if there are too many entries. So still ~ O(1) !

IMHO your problem is not really speed but size. I once had the problem of a giant hash which was constantly swapping. So each access was limited by the speed of my hard-disk (ugly bottleneck).

I was able to solve that by transforming it into a a HoH and splitting the old key into two halves.

i.e. $new{long}{_key} = $old{long_key}

this worked because I was able to make the algorithm run thru sorted keys, i.e. the "lower" hash needed to be loaded only once into RAM when the corresponding upper key long was processed.

This way Perl only needed to keep two hashes in memory. This is quite scalable...the only limitation is the size of your hard disk then.

So if you can manage your accesses (reads and writes) in a way that "long_keys" with the same "start" are bundled this is my recommendation (within the limited infos you gave).

If order doesn't matter this should be easy!

HTH! :)

Cheers Rolf

( addicted to the Perl Programming Language)

Comment on Re: Small Hash a Gateway to Large Hash? Select or Download Code

Replies are listed 'Best First'.
Re^2: Small Hash a Gateway to Large Hash? by lsherwood (Beadle) on Feb 18, 2014 at 21:49 UTC
Rolf- Your solution is intriguing. We don't see evidence of swapping to disk, but certainly is something to keep in mind.	[reply]
Re^3: Small Hash a Gateway to Large Hash? by LanX (Saint) on Feb 18, 2014 at 23:55 UTC
No swapping means your hash is kept in RAM. Noticeable collisions are very unlikely so accessing this hash can't be done faster. You said the hash is only build once and start-up time is no problem, so writing and rehashing doesn't byte you. You said you are reading and processing many files... thats the most likely place for optimization. But without profiling this is only shooting in the dark Could it be that you are parsing the files and checking if entries match against the hash? Then trie-optimized regexes could be very performant... (you might need runs with several regexes to avoid buffer overruns, but 10000 short strings should fit) But w/o much info that's again shooting in the dark. Cheers Rolf ( addicted to the Perl Programming Language)	[reply]
Re^4: Small Hash a Gateway to Large Hash? by BrowserUk (Patriarch) on Feb 19, 2014 at 00:07 UTC
trie-optimized regexes could be very performant No, they couldn't. Follow your own advice and profile. The results will correct your mis-impression.	[reply]


Syntactic Confectionery Delight
	PerlMonks