Re: Re: Re: Slurping BIG files into Hashes

in reply to Re: Re: Slurping BIG files into Hashes
in thread Slurping BIG files into Hashes

Aha, t'was written...

"I am wondering if you are getting a very high collision rate on they key for some reason?"

I reckon that this is what the problem is as the key values are very similar all the way through and there's not a lot which can be done about it. Hmmm... I'm trying to think of a better data structure. Thanks for the help everybody who contributed.

Elgon

PS - The box is an 8 processor Sun server running Solaris with 8GB of RAM. Neither the IO nor the memory seem to be the problem from continuous observation of the stats.

update - Thanks to BrowserUK et al. for their help unfortunately the version we are using is 5.004_5 and I am not allowed to change it. Oh well. I'm trying to find a workaround as we speak...

update 2 - Thanks to jsprat, the script now runs in about a minute. Ta to all...

Please, if this node offends you, re-read it. Think for a bit. I am almost certainly not trying to offend you. Remember - Please never take anything I do or say seriously.

Comment on Re: Re: Re: Slurping BIG files into Hashes

Replies are listed 'Best First'.
Re: Re: Re: Re: Slurping BIG files into Hashes by jsprat (Curate) on Jun 19, 2003 at 02:26 UTC
Try presizing the hash - `keys %lookup = 160_000;` If it is hash collisions, this might solve the problem. dominus has an interesting bit at perl.plover.com called When Hashes Go Wrong. Update: Meant to ask you to "`print scalar %lookup;`" after all is done. `scalar %hash` will give you the number of used buckets / number of allocated buckets. If the number of used buckets is low (like 1/16) all your hash items have been put in the same bucket!	[reply] [d/l] [select]

Replies are listed 'Best First'.

Re: Re: Re: Re: Slurping BIG files into Hashes
by jsprat (Curate) on Jun 19, 2003 at 02:26 UTC

keys %lookup = 160_000;

If it is hash collisions, this might solve the problem.

dominus has an interesting bit at perl.plover.com called When Hashes Go Wrong.

Update: Meant to ask you to "print scalar %lookup;" after all is done. scalar %hash will give you the number of used buckets / number of allocated buckets. If the number of used buckets is low (like 1/16) all your hash items have been put in the same bucket!

[reply]
[d/l]
[select]

In Section Seekers of Perl Wisdom