Re: Find unique elements in an array

Here is a faster solution, using a hash slice:

use Benchmark qw(:all) ;

my @a;
push @a, int (rand(10)) foreach 1..100;
my %unique;
my @awd;
cmpthese($1000, {
                  'jc' =>
                  sub { foreach my $thingy (@a) { $unique{$thingy} = 1
+; }
                        @awd = keys %unique;
                     },
                  'mk' =>
                  sub { @unique{ @a} = 1;
                        @awd = keys %unique;
                     },
                 });
[download]

The results are

Benchmark: running jc, mk for at least 3 CPU seconds...
        jc:  3 wallclock secs ( 3.14 usr +  0.00 sys =  3.14 CPU) @ 53
+5.67/s (n=20522)
        mk:  4 wallclock secs ( 3.27 usr +  0.00 sys =  3.27 CPU) @ 14
+937.00/s (n=48844)
      Rate   jc   mk
jc  6536/s   -- -56%
mk 14937/s 129%   --
[download]

Update: $1000 is a typo should be 1000. I'll keep it as is for comparisons with the following nodes. Thanks BrowserUK and tilly!

-Mark

Comment on Re: Find unique elements in an array Select or Download Code

Replies are listed 'Best First'.
Re: Re: Find unique elements in an array by tilly (Archbishop) on Apr 04, 2004 at 20:25 UTC
Calling undef on a hash slice is faster still. `use Benchmark qw(:all) ; my @a; push @a, int (rand(10)) foreach 1..100; my %unique; my @awd; cmpthese($1000, { 'jc' => sub { foreach my $thingy (@a) { $unique{$thingy} = 1 +; } @awd = keys %unique; }, 'mk' => sub { @unique{ @a} = 1; @awd = keys %unique; }, 'bt' => sub { undef(@unique{@a}); @awd = keys %unique; }, });` [download] gives me `Rate jc mk bt jc 16556/s -- -54% -62% mk 36080/s 118% -- -18% bt 44007/s 166% 22% --` [download] (Incidentally that is a truely bizarre indent style.)	[reply] [d/l] [select]
Re: Re: Re: Find unique elements in an array by BrowserUk (Patriarch) on Apr 04, 2004 at 22:28 UTC
What is the purpose of using undef as the iteration count for `cmpthese( $1000, ... );`? Is that what causes the warnings? Use of uninitialized value in numeric gt (>) at c:/Perl/lib/Benchmark. +pm line 831. Use of uninitialized value in numeric gt (>) at c:/Perl/lib/Benchmark. +pm line 838. Use of uninitialized value in numeric eq (==) at c:/Perl/lib/Benchmark +.pm line 776. Use of uninitialized value in numeric gt (>) at c:/Perl/lib/Benchmark. +pm line 840. Use of uninitialized value in numeric gt (>) at c:/Perl/lib/Benchmark. +pm line 791. Use of uninitialized value in numeric eq (==) at c:/Perl/lib/Benchmark +.pm line 776. Use of uninitialized value in numeric gt (>) at c:/Perl/lib/Benchmark. +pm line 791. Use of uninitialized value in numeric eq (==) at c:/Perl/lib/Benchmark +.pm line 776. Use of uninitialized value in numeric gt (>) at c:/Perl/lib/Benchmark. +pm line 791. Use of uninitialized value in numeric eq (==) at c:/Perl/lib/Benchmark +.pm line 776. [download] Examine what is said, not who speaks. "Efficiency is intelligent laziness." -David Dunham "Think for yourself!" - Abigail	[reply] [d/l] [select]
Re: Re: Re: Re: Find unique elements in an array by tilly (Archbishop) on Apr 05, 2004 at 04:25 UTC
You'll have to ask kvale that question, not me. I copied everything except the last test from him. I'd guessed that he meant to type "1000" and messed up. I noticed wondered about that as I was copying it back, but didn't care enough to modify the code and re-run it...	[reply]
Re: Re: Re: Re: Re: Find unique elements in an array by BrowserUk (Patriarch) on Apr 05, 2004 at 05:33 UTC
Re: Re: Find unique elements in an array by ysth (Canon) on Apr 04, 2004 at 13:35 UTC
saying `= ()` instead of `= 1` is slightly faster, but the hash slice approach may not scale well with large @a's, since the whole array needs to be placed on the stack at once.	[reply] [d/l] [select]
Re: Re: Re: Find unique elements in an array by kvale (Monsignor) on Apr 04, 2004 at 15:52 UTC
If there is a stack penalty, it is not terrible, as the routine get faster with large arrays: `use Benchmark qw(:all) ; my @a; push @a, int (rand(100)) foreach 1..2_000_000; my %unique; my (@awd1, @awd2, @awd3); cmpthese(5, { 'jc' => sub { foreach my $thingy (@a) { $unique{$thingy} = 1 +; } @awd1 = keys %unique; }, 'mk' => sub { @unique{ @a} = 1; @awd2 = keys %unique; }, 'ys' => sub { @unique{ @a} = (); @awd3 = keys %unique; }, });` [download] yields `Benchmark: timing 5 iterations of jc, mk, ys... jc: 19 wallclock secs (16.75 usr + 0.35 sys = 17.10 CPU) @ 0 +.29/s (n=5) mk: 6 wallclock secs ( 6.00 usr + 0.01 sys = 6.01 CPU) @ 0 +.83/s (n=5) ys: 7 wallclock secs ( 6.00 usr + 0.00 sys = 6.00 CPU) @ 0 +.83/s (n=5) s/iter jc mk ys jc 3.42 -- -65% -65% mk 1.20 185% -- -0% ys 1.20 185% 0% --` [download] The `= ()` optimization does not seem to make much difference. -Mark	[reply] [d/l] [select]
Re: Re: Re: Re: Find unique elements in an array by ysth (Canon) on Apr 04, 2004 at 19:53 UTC
With your original benchmark, I saw a consistent 4-5% increase for (). Obviously this is a constant difference that disappears into the woodwork with larger slices.	[reply]
Re: Find unique elements in an array by ccn (Vicar) on Apr 04, 2004 at 16:54 UTC
Anyway `@unique{@a} = 1;` looks a bit curious. Why do we set to 1 the only value in a huge hash?	[reply] [d/l]
Re: Re: Find unique elements in an array by kappa (Chaplain) on Apr 04, 2004 at 21:04 UTC
As per ccn's comment below, `@unique{@a} = 1` looks weird and effectively equals to this: `$unique{$a[0]} = 1; undef(@unique{@a[1..$#a]});`, which, BTW, is good, as, according to tilly, storing lots of undefs instead of 1's in a hash to create keys is a Good Thing(tm) for performance.	[reply] [d/l] [select]


We don't bite newbies here... much
	PerlMonks