http://qs321.pair.com?node_id=740873


in reply to mathematical proof

Most hash insertions are O(1). When the hash has to be expanded, the insertion becomes O(N). Since Perl grows hashes exponentially, inserting N elements is O(N log N). ( See tilly's reply. Inserting N elements into a hash insertions is O(N). )

Most array appends are O(1). When the array has to be expanded, the append becomes O(N). Since Perl grows arrays exponentially, inserting N elements (one at a time) is O(N log N). ( See tilly's reply. Inserting N elements into a hash insertions is O(N). ) Inserting all N elements at once is just O(N), though. Once the array is loaded, you need to sort it, for a total of O(N log N).

So you end up with O(N log N) either way. The question that remains is what's factor, and how close to the bound (N log N) is the actual performance for your actual data.

The percentage of duplicates is going to matter. The more duplicates, the better the hash approach will be.