in reply to mathematical proof
Most hash insertions are O(1). When the hash has to be expanded, the insertion becomes O(N). Since Perl grows hashes exponentially, inserting N elements is O(N log N). ( See tilly's reply. Inserting N elements into a hash insertions is O(N). )
Most array appends are O(1). When the array has to be expanded, the append becomes O(N). Since Perl grows arrays exponentially, inserting N elements (one at a time) is O(N log N). ( See tilly's reply. Inserting N elements into a hash insertions is O(N). ) Inserting all N elements at once is just O(N), though. Once the array is loaded, you need to sort it, for a total of O(N log N).
So you end up with O(N log N) either way. The question that remains is what's factor, and how close to the bound (N log N) is the actual performance for your actual data.
The percentage of duplicates is going to matter. The more duplicates, the better the hash approach will be.


Replies are listed 'Best First'.  

Re^2: mathematical proof
by tilly (Archbishop) on Feb 03, 2009 at 05:02 UTC  
by JavaFan (Canon) on Feb 03, 2009 at 08:00 UTC  
by tilly (Archbishop) on Feb 03, 2009 at 14:39 UTC  
by JavaFan (Canon) on Feb 03, 2009 at 15:01 UTC  
by tilly (Archbishop) on Feb 03, 2009 at 16:40 UTC  
by ikegami (Patriarch) on Feb 03, 2009 at 17:10 UTC  
by ikegami (Patriarch) on Feb 03, 2009 at 17:02 UTC  
by tilly (Archbishop) on Feb 03, 2009 at 17:27 UTC 