Re: To Hash or to Array--Uniqueness is the question.

I'm surprised no one has mentioned order yet. If you want unique values but you also want to preserve the order of the values, a simple hash isn't going to work. I would store the items into both structures, personally. Maybe not the most efficient way to do it, but set the value in the hash (keeping count if you like, as mentioned previously) and pushing onto an array if the hashed item didn't already exist. Something like:

#!/usr/bin/perl
use strict;
use warnings;

my (%found, @ordered_uniques);

while (my $line = <DATA>) {
    chomp $line;
    next if $line !~ /\w/; # Skip non-word lines

    if (not exists $found{$line}) {
        push @ordered_uniques, $line;
    }

    $found{$line}++;
}

print "Ordered unique values:\n";
print (join ", ", @ordered_uniques);
print "\n\nUnordered unique values:\n";
print (join ", ", keys %found);

__DATA__
one
foo
bar
one
baz
two
quack
baz
[download]

which outputs the expected:

Ordered unique values:
one, foo, bar, baz, two, quack

Unordered unique values:
bar, baz, one, quack, foo, two
[download]

So "best practice" depends on your application.

Comment on Re: To Hash or to Array--Uniqueness is the question. Select or Download Code

Replies are listed 'Best First'.
Re^2: To Hash or to Array--Uniqueness is the question. by shemp (Deacon) on Dec 02, 2005 at 01:26 UTC
I also use this type of solution under certain circumstances. The Tie::IxHash module (tied hash) takes care of storing it both ways, it keeps track of the insertion order into the hash I use the most powerful debugger available: print!	[reply]
Re^2: To Hash or to Array--Uniqueness is the question. by Moron (Curate) on Dec 02, 2005 at 16:19 UTC
A simple hash can be made to maintain order and since zero extra lines are used in the foregoing example (just a couple of inline additions), the overheads of interfacing to a module seem way too much. One should not reach for the nearest module the minute the going gets the slightest bit non-trivial - rather later in the thought process perhaps - it also seems appropriate to encourage people to think rather than be mentally lazy. `my %h = (); while (<$fh>) { $h{ $_ } \|\|= $.; } close $fh; print sort { $h{a} <=> $h{b} } keys %h;` [download] -M Free your mind	[reply] [d/l]
Re^3: To Hash or to Array--Uniqueness is the question. by shemp (Deacon) on Dec 04, 2005 at 00:08 UTC
While i agree that this solution works, it only works in the very isolated situation where you are reading the keys from a file and that you get $. effectively for free (not really, but Perl is going to manipulate $. anyway) But in general this solution does not work directly. If the hash is being populated in a different way than by reading the keys from a file then $. is lost. Sure you could pick some other variable to be the order count for the keys, something like: `my %h; my $h_order = 0; while ( my $key = get_key_from_somewhere() ) { if ( ! exists $h{$_} ) { $h{$_} = $h_order++; } }` [download] But then $h_order will need to be passed around with %h if any other values are going to be assigned, not to mention always performing the additional logic whenever assignments occur. If %h is going to be used in some function that i don't have control of the internals then the order is lost. I still stand by my Tie::IxHash solution because with it, other code that uses %h doesn't need to know anything special about it, it just operates like it was set up to operate. As for the overhead of tieing, it is rather small. Perhaps a 5-10% decrease in performance over using an untied hash, but the untied hash doesn't give the needed usage, so in implementing the functionality you need, you are losing the performance gain from not using the module. I use the most powerful debugger available: print!	[reply] [d/l]
Re^4: To Hash or to Array--Uniqueness is the question. by Moron (Curate) on Dec 05, 2005 at 09:17 UTC
I agree that if $. isn't available you have to use a separate variable, which is hardly a big issue. But my criticism of tying was more a question of avoiding over-engineering than anything else. In addition, non-core modules are effectively prohibited for all the large corporate production environments I've ever worked in. Why 'effectively' rather than explicitly is another tale. -M Free your mind	[reply]
Re^5: To Hash or to Array--Uniqueness is the question. by Happy-the-monk (Canon) on Dec 05, 2005 at 09:21 UTC
Re^5: To Hash or to Array--Uniqueness is the question. by shemp (Deacon) on Dec 06, 2005 at 00:18 UTC
Re^5: To Hash or to Array--Uniqueness is the question. by Moron (Curate) on Dec 06, 2005 at 10:34 UTC
Re^3: To Hash or to Array--Uniqueness is the question. by Nkuvu (Priest) on Dec 02, 2005 at 17:10 UTC
On the contrary, you're sorting the keys of the hash, but you lost the order of the input. Given the sample data I used in my script (one foo bar one baz two quack baz) the sort method prints out `bar, baz, one, quack, foo, two`. Of course if one wanted the sorted order then the use of the array is pointless. But if one is looking for the order of the input data, an array (or a module) is, as far as I know, necessary.	[reply] [d/l]
Re^4: To Hash or to Array--Uniqueness is the question. by Eimi Metamorphoumai (Deacon) on Dec 02, 2005 at 17:42 UTC
No, he's using the values of the hash to store the intended order, and sorting on the values instead of the keys. The downside is that instead of just keeping the order around, we're spending time sorting to get the order back.	[reply]
Re^5: To Hash or to Array--Uniqueness is the question. by Nkuvu (Priest) on Dec 02, 2005 at 17:48 UTC
Re^2: To Hash or to Array--Uniqueness is the question. by choedebeck (Beadle) on Dec 02, 2005 at 19:19 UTC
Order is my general rule of thumb. I usually ask myself, "Does the order matter?" If I answer yes, I usually go with an array, if I answer no, I usually go with a hash.	[reply]


more useful options
	PerlMonks