Hashes and arrays can both be implemented as in memory data structures, or on disk data structures. Therefore it is perfectly reasonable to talk about what each looks like in memory and on disk.
In Perl the two options don't even look different for hashes, you just add an appropriate tie. The difference is slightly larger for arrays because you don't want to use the built-in sort on a large array that lives on disk.
As for my duplicates comment, I am not denying that a hash is better than an array whether or not there are duplicates. However the speed of accessing a hash is basically independent of how many duplicates there are in your incoming data structure. (There are subtle variations depending on, for instance, whether you are just before or after a hash split, but let's ignore that.) The speed of doing a merge sort that eliminates duplicates ASAP varies greatly depending on the mix of duplicates in your incoming data structure. Therefore the array solution improves relative to the hash as you increase the number of duplicates. This doesn't make the array solution
In fact in the extreme case where you have a fixed number of distinct lines in your data structure, the array solution improves from O(n log(n)) to O(n). The hash solution does not improve, it is O(n) regardless.
Are you posting in the right place? Check out Where do I post X? to know for sure.
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
Want more info? How to link
or How to display code and escape characters
are good places to start.