Perl-Sensitive Sunglasses | |
PerlMonks |
comment on |
( [id://3333]=superdoc: print w/replies, xml ) | Need Help?? |
You could lump longer symbols together a la join qq(\0), grep length() > 8, @syms; access via ptr or offset. Keep short ones directly in the 8-byte slots of the symbol table. (union). This should get you down to ~10 bytes per key. To index the tables, use minimal perfect hashing. There's the CMPH library, which appears quite simple to use. Though you'll need to work on the glue — cmph bindings via Perfect::Hash are incomplete, and it's not on CPAN. Anyway, with perfect hashing you need but a few extra bits to do the indexing. Various methods are available. CHM needs ~8 bytes per key, but preserves order. BRZ is about ~8 bits per key, uses external memory while building the tables so you're not mem-constrained. BDZ is just under 3 bits per key. Storing the 8-byte keys as you are already, these few extra bits aren't much. Performance estimates: about 3 sec per million keys when constructing the mph structures; about .3 sec per million lookups. Compared to .15 usec lookups with direct hashes. So expect roughly 3 times slower performance with about 10 times the memory density. If the tradeoff is worth the effort, I'll leave for you to decide. In reply to Re: Bidirectional lookup algorithm? (try perfect hashing)
by oiskuu
|
|