http://qs321.pair.com?node_id=266957


in reply to Re: Re: Slurping BIG files into Hashes
in thread Slurping BIG files into Hashes

Aha, t'was written...

"I am wondering if you are getting a very high collision rate on they key for some reason?"

I reckon that this is what the problem is as the key values are very similar all the way through and there's not a lot which can be done about it. Hmmm... I'm trying to think of a better data structure. Thanks for the help everybody who contributed.

Elgon

PS - The box is an 8 processor Sun server running Solaris with 8GB of RAM. Neither the IO nor the memory seem to be the problem from continuous observation of the stats.

update - Thanks to BrowserUK et al. for their help unfortunately the version we are using is 5.004_5 and I am not allowed to change it. Oh well. I'm trying to find a workaround as we speak...

update 2 - Thanks to jsprat, the script now runs in about a minute. Ta to all...

Please, if this node offends you, re-read it. Think for a bit. I am almost certainly not trying to offend you. Remember - Please never take anything I do or say seriously.

  • Comment on Re: Re: Re: Slurping BIG files into Hashes

Replies are listed 'Best First'.
Re: Re: Re: Re: Slurping BIG files into Hashes
by jsprat (Curate) on Jun 19, 2003 at 02:26 UTC
    Try presizing the hash - keys %lookup = 160_000;

    If it is hash collisions, this might solve the problem.

    dominus has an interesting bit at perl.plover.com called When Hashes Go Wrong.

    Update: Meant to ask you to "print scalar %lookup;" after all is done. scalar %hash will give you the number of used buckets / number of allocated buckets. If the number of used buckets is low (like 1/16) all your hash items have been put in the same bucket!