Which, if any, is faster?

rvosa has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Which, if any, is faster? by BrowserUk (Patriarch) on Jun 01, 2005 at 09:00 UTC
Array lookup is faster than hash lookup, but the time you save is totally swamped by the time Perl takes to indirect through a blessed reference. If you shift your args rather than using the aliasing of @_ you waste a little more and if you use accessors, you'll slow it down (a lot ) more. Basically, if speed is truely a criteria, don't use OO in Perl. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal? "Science is about questioning the status quo. Questioning authority". The "good enough" maybe good enough for the now, and perfection maybe unobtainable, but that should not preclude us from striving for perfection, when time, circumstance or desire allow.	[reply]
Re^2: Which, if any, is faster? by monarch (Priest) on Jun 01, 2005 at 13:32 UTC
Personally I like to stay clear of hashes of hashes, particularly where the second hash contains common keys (e.g. the column names of a table). At the end of the day arrays are probably not that much faster than hashes.. but hashes sure do take up a lot of extra memory with those verbose keys.. particularly if the verbose keys are containing redundant information.	[reply]
Re^3: Which, if any, is faster? by tphyahoo (Vicar) on Jun 01, 2005 at 13:53 UTC
I like HOHes. At times they have contributed significantly to my WUD (What U Did) when they were the right thing to use. If this isn't always the fastest solution? Well, I code for understandability and maintainability first. Speed... enh. :)	[reply]
Re^3: Which, if any, is faster? by Smylers (Pilgrim) on Jun 01, 2005 at 13:55 UTC
... arrays are probably not that much faster than hashes ... Indeed not. Instead of switching from arrays to hashes you can probably get a similar speed-up by getting some slightly faster hardware — which'll likely be cheaper than what it'd cost in extra development time to have to deal with inappropriate data structures. ... but hashes sure do take up a lot of extra memory with those verbose keys.. particularly if the verbose keys are containing redundant information. That simply isn't true. Wanting to have several separate hashes with the same keys is a common situation. Indeed it's one the Perl 5 Porters considered, so they explicitly coded to optimize the memory usage: all hash keys are only stored in memory once no matter no many hashes they are used in -- see perl5004delta. To summarise: use hashes where each key contains some unique information, and use arrays where key names would be duplicated under a hash arrangement. To summarize: don't try to second-guess Perl, and don't assume that you are cleverer that the Perl 5 Porters. Smylers	[reply]
Re^4: Which, if any, is faster? (20% faster/ 40% smaller) by BrowserUk (Patriarch) on Jun 01, 2005 at 19:52 UTC
Re^5: Which, if any, is faster? (20% faster/ 40% smaller) by adrianh (Chancellor) on Jun 02, 2005 at 08:02 UTC
Some notes below your chosen depth have not been shown here
Re: Which, if any, is faster? by Zaxo (Archbishop) on Jun 01, 2005 at 08:34 UTC
I believe array indexing is a bit faster than hash lookup - certainly where scaling matters. For small structures like you have, you'll never notice the difference. The Benchmark module is handy for testing such questions on real code. Don't worry about it until you know you have a performance problem. After Compline, Zaxo	[reply]
Re^2: Which, if any, is faster? by Gekitsuu (Scribe) on Jun 01, 2005 at 08:41 UTC
I don't know if I'd agree with ignoring potential problems. I think they should at least question (like they are) the possible impact of using an array vs a hash. My recommendation would be to write a simple loop that runs through what ever process you are trying to run using each, and time how long it takes to run through 100, 1000, or 1000000 iterations to get an accurate idea of which is better suited for your needs. Ultimately you will have people arguing for both sides but only through testing are you going to find out which is more efficient.	[reply]
Re^3: Which, if any, is faster? by rvosa (Curate) on Jun 01, 2005 at 08:55 UTC
Here's a little bit of background: I have an object `Node` that is, at its core, now a blessed anonymous hash. The hash contains keys `first_daughter`, `last_daughter`, `next_sister`, `previous_sister` and `parent`, the values of which are references pointing to other such nodes. (The nodes are contained within a `Tree` object, which is a blessed anonymous array.) I've found that with large-ish trees - my testcase has 427 nodes - traversals, for example recursively going `first_daughter->next_sister` from root to tips where on every step the reference is retrieved through an accessor method, becomes somewhat slow. So the question is whether it would help to have the references stored in an anonymous array? As you suggest, I'll probably try it out anyway, but I was wondering if anyone has any experience with this kind of structure & method and whether they can say it is or isn't worth the trouble. A hash is certainly easier to remember than keeping track of the array indices for the various getters & setters.	[reply] [d/l] [select]
Re^4: Which, if any, is faster? by BrowserUk (Patriarch) on Jun 01, 2005 at 13:06 UTC
Re: Which, if any, is faster? by blazar (Canon) on Jun 01, 2005 at 08:28 UTC
Will the data from $b->[0] be returned faster than that from $a->{'High'}? What if I write proper accessors?, i.e. $b->getLeft() versus $a->getHigh Is there a difference? Big? Small? Personally I wouldn't care too much. But that's just me. What I keep hearing around is that micro-optimizations are rarely worth the effort. Then I have a limited experience in this sense, but it all tends to support such claims.	[reply]
Re: Which, if any, is faster? by thcsoft (Monk) on Jun 01, 2005 at 08:53 UTC
in addition to what the others already stated, there is a means of speeding up your own code as well: 1.) always pass data structures by reference, not by value! 2.) if you got a hashref, there's no need for iterating over all its keys, alike for an arrayref. `sub new { my ($class, $params) = @_; my $self = $params; bless $self, $class; }` [download] i rather bet that benchmarking our both codes will show the difference between them. language is a virus from outer space.	[reply] [d/l]
Re^2: Which, if any, is faster? by PetaMem (Priest) on Jun 01, 2005 at 15:03 UTC
And while we are at it, a `sub xx { my $var1 = shift; my $var2 = shift; ...` [download] is faster than `sub xx { my ($var1,$var2) = @_; ...` [download] It also gives you the oportunity for better comments and I personally feel more safe if crunching @_ instead of copying and leaving it untouched. Bye PetaMem All Perl: MT, NLP, NLU	[reply] [d/l] [select]
Re: Which, if any, is faster? by tlm (Prior) on Jun 01, 2005 at 13:56 UTC
I want to second BrowserUk's point about OO vs. speed. One of the central strategies behind OO is indirection, often multiple levels of it. The idea is that by introducing levels of indirection one decouples various parts of a program. For instance, the use of accessors introduces one level of indirection, because instead of accessing the data "directly" (through, say, `$obj->{ field }`), one accesses it "indirectly" by calling `$obj->field` which in turn accesses `$obj->{ field }`. This trick decouples all the parts of the program that access this data from the parts of the program that implement the storage and management of the data, so that the latter can be altered, possibly radically, without this having any effect on the code for the former. (I see the accessor is being something reminiscent of a "hinge" between two parts of the code.) Once you grok this indirection business, you'll see it at work everywhere, and in many guises, even in code that is otherwise not terribly OO. This decoupling greatly eases the task of maintaining code, but it comes at a price performance-wise. I see this tension as being obviously fundamental (i.e. unresolvable), although I have found people who reject this view. tilly sparked a very interesting exchange on this subject recently; see also this. the lowliest monk	[reply]
Re: Which, if any, is faster? by dragonchild (Archbishop) on Jun 01, 2005 at 13:31 UTC
Regardless of what your underlying sturcture is, you should always refer to your data via accessor. If for no other reason than it allows you to change your underlying structure, if you want. :-) As for speed - the array implementation is about 15% faster and smaller. So, you save about 30 picoseconds over the life of a standard script. That and $3.50 will get you a Venti Mocha. In general, if you think something isn't in Perl, try it out, because it usually is. :-) "What is the sound of Perl? Is it not the sound of a wall that people have stopped banging their heads against?"	[reply]
Re: Which, if any, is faster? by schwern (Scribe) on Jun 02, 2005 at 07:58 UTC
Will the data from $b->[0] be returned faster than that from $a->{'High'}? This changes from Perl version to Perl version, compiler to compiler, CPU to CPU, etc... It also varies if $a/$b is lexical or global and if the index/key is a static string/number vs a variable. In any case, the difference is almost neglidgable. Perl's hashes are surprisingly fast (or, if you're a pessimist, Perl's arrays are surprisingly slow). You can benchmark it with Benchmark.pm if you really want but realize your benchmarks are likely to change in the next version of Perl or machine you run the code on. What if I write proper accessors? Then the cost of calling the accessor function will completely swamp any difference in hash or array access. If you do go with the array technique, please be kind to your future maintainers and use constants to access the arrays rather than raw numbers. `use constant LEFT => 0; use constant RIGHT => 1; $self->[LEFT] = $params{Left}; $self->[RIGHT] = $params{Right};` [download] However, the cost of loading constant.pm will probably eat more CPU time than what you might save using array objects. You could define the constants manually using `sub LEFT () { 0 }` but you see how this is rapidly getting complicated. Which is to say, just use hashes. They're simpler and when all the smoke clears your code will be just as fast.	[reply] [d/l] [select]
Re^2: Which, if any, is faster? by BrowserUk (Patriarch) on Jun 02, 2005 at 08:31 UTC
However, the cost of loading constant.pm will probably eat more CPU time than what you might save using array objects. That depends entirely upon how many times you access your objects. Loading constants.pm is a one-time hit at startup. The savings of array-based access acrue. The more your program does with your array-based instances, and the longer it runs, the greater the savings. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal? "Science is about questioning the status quo. Questioning authority". The "good enough" maybe good enough for the now, and perfection maybe unobtainable, but that should not preclude us from striving for perfection, when time, circumstance or desire allow.	[reply]


Syntactic Confectionery Delight
	PerlMonks