efficiency of exists()

MrAtheist has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: efficiency of exists() by CountZero (Bishop) on Sep 24, 2009 at 20:45 UTC
Looking up hash-keys is an O(1) operation, so it should be as fast for a hash with 10 or with 1000 or with a 1000000 elements. That's the beauty of hashes. Perl does not have to "walk" the hash, the hashing function will directly "jump" to the right location and see if the key exists. CountZero A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James	[reply]
Re: efficiency of exists() by ccn (Vicar) on Sep 24, 2009 at 20:42 UTC
The call of `exists` function takes much less time than fetching data from DB. So don't worry about it.	[reply] [d/l]
Re: efficiency of exists() by graff (Chancellor) on Sep 25, 2009 at 02:14 UTC
If/when you get to a point of having many millions of keys in your database, the start-up time to load all of them into memory for your "fast check", and (eventually) the memory consumption for the hash itself, could put you beyond a point of diminishing returns. Depending on how big the DB table gets, how many look-ups you actually do in one run of your script, and what else is going on besides look-ups, you might cross a threshold where the script runs faster if you just do queries to check for key values, rather than loading all keys into a hash for that purpose. BTW, if the reason for checking the existence of a key is to decide whether you should do an insert vs. an update (or insert vs. nothing), you might want to check out the "INSERT ... ON DUPLICATE KEY UPDATE ..." syntax in mysql. In any case, if speed is really an issue, you'll want to have a benchmark for testing the alternatives. Use Benchmark if you like, or just have two versions of a job that will do a fair test for both approaches. You'll want it to be able to compare the timing now, and also make an equivalent comparison at any time in the future, to see if table size affects one approach differently from the other.	[reply]
Re: efficiency of exists() by MarkovChain (Sexton) on Sep 24, 2009 at 20:53 UTC
It's called a hash for a reason :) http://en.wikipedia.org/wiki/Hash_table	[reply]
Re: efficiency of exists() by MrAtheist (Initiate) on Sep 24, 2009 at 22:09 UTC
Thanks for the replies!	[reply]


Welcome to the Monastery
	PerlMonks