http://qs321.pair.com?node_id=613784


in reply to Re: Perl solution for storage of large number of small files
in thread Perl solution for storage of large number of small files

I forgot to add...

The 'best' way I discovered to store info is to use a base32 encoded digest of the file.

Why? well, most filesystems start to have issues at some point after 1000 items per directory. using a base32 representation, you can hit the sweet spot.

with a base32 formula and 2chars per bin, you'll have 1024 bins per depth ( 32*32 ). using the standard hex based md5 representation, your options are either 256 buckets per depth ( 16*16 ) or 4096 ( 16*16*16 ).

in my personal use, i haven't seen all that much of a difference between 1024 and 4096 buckets -- though i've seen a slight difference. its not as drastic as the performance between either and 10k though.

since i'm lazy and I don't have high performace constraints, i just go with 2 levels deep of 4096 hashing. but if i had more time, I'd definitely go with 3 levels of 1024 hashing.

( the time / lazyness is a factor because every language supports md5 as base16 or base64 by default -- no one has a base32 default , which is a PITA if you're managing an architecture accessed by multiple languages at once ).

  • Comment on Re^2: Perl solution for storage of large number of small files