Re: Find unique elements in an array
by kvale (Monsignor) on Apr 04, 2004 at 07:52 UTC
|
Here is a faster solution, using a hash slice:
use Benchmark qw(:all) ;
my @a;
push @a, int (rand(10)) foreach 1..100;
my %unique;
my @awd;
cmpthese($1000, {
'jc' =>
sub { foreach my $thingy (@a) { $unique{$thingy} = 1
+; }
@awd = keys %unique;
},
'mk' =>
sub { @unique{ @a} = 1;
@awd = keys %unique;
},
});
The results are
Benchmark: running jc, mk for at least 3 CPU seconds...
jc: 3 wallclock secs ( 3.14 usr + 0.00 sys = 3.14 CPU) @ 53
+5.67/s (n=20522)
mk: 4 wallclock secs ( 3.27 usr + 0.00 sys = 3.27 CPU) @ 14
+937.00/s (n=48844)
Rate jc mk
jc 6536/s -- -56%
mk 14937/s 129% --
Update: $1000 is a typo should be 1000. I'll keep it as is for comparisons with the following nodes. Thanks BrowserUK and tilly!
| [reply] [d/l] [select] |
|
Calling undef on a hash slice is faster still.
use Benchmark qw(:all) ;
my @a;
push @a, int (rand(10)) foreach 1..100;
my %unique;
my @awd;
cmpthese($1000, {
'jc' =>
sub { foreach my $thingy (@a) { $unique{$thingy} = 1
+; }
@awd = keys %unique;
},
'mk' =>
sub { @unique{ @a} = 1;
@awd = keys %unique;
},
'bt' =>
sub { undef(@unique{@a});
@awd = keys %unique;
},
});
gives me
Rate jc mk bt
jc 16556/s -- -54% -62%
mk 36080/s 118% -- -18%
bt 44007/s 166% 22% --
(Incidentally that is a truely bizarre indent style.) | [reply] [d/l] [select] |
|
What is the purpose of using undef as the iteration count for cmpthese( $1000, ... );?
Is that what causes the warnings?
Use of uninitialized value in numeric gt (>) at c:/Perl/lib/Benchmark.
+pm line 831.
Use of uninitialized value in numeric gt (>) at c:/Perl/lib/Benchmark.
+pm line 838.
Use of uninitialized value in numeric eq (==) at c:/Perl/lib/Benchmark
+.pm line 776.
Use of uninitialized value in numeric gt (>) at c:/Perl/lib/Benchmark.
+pm line 840.
Use of uninitialized value in numeric gt (>) at c:/Perl/lib/Benchmark.
+pm line 791.
Use of uninitialized value in numeric eq (==) at c:/Perl/lib/Benchmark
+.pm line 776.
Use of uninitialized value in numeric gt (>) at c:/Perl/lib/Benchmark.
+pm line 791.
Use of uninitialized value in numeric eq (==) at c:/Perl/lib/Benchmark
+.pm line 776.
Use of uninitialized value in numeric gt (>) at c:/Perl/lib/Benchmark.
+pm line 791.
Use of uninitialized value in numeric eq (==) at c:/Perl/lib/Benchmark
+.pm line 776.
Examine what is said, not who speaks.
"Efficiency is intelligent laziness." -David Dunham
"Think for yourself!" - Abigail
| [reply] [d/l] [select] |
|
|
|
saying = () instead of = 1 is slightly faster, but the hash slice approach may not scale well with large @a's, since the whole array needs to be placed on the stack at once.
| [reply] [d/l] [select] |
|
If there is a stack penalty, it is not terrible, as the routine get faster with large arrays:
use Benchmark qw(:all) ;
my @a;
push @a, int (rand(100)) foreach 1..2_000_000;
my %unique;
my (@awd1, @awd2, @awd3);
cmpthese(5, {
'jc' =>
sub { foreach my $thingy (@a) { $unique{$thingy} = 1
+; }
@awd1 = keys %unique;
},
'mk' =>
sub { @unique{ @a} = 1;
@awd2 = keys %unique;
},
'ys' =>
sub { @unique{ @a} = ();
@awd3 = keys %unique;
},
});
yields
Benchmark: timing 5 iterations of jc, mk, ys...
jc: 19 wallclock secs (16.75 usr + 0.35 sys = 17.10 CPU) @ 0
+.29/s (n=5)
mk: 6 wallclock secs ( 6.00 usr + 0.01 sys = 6.01 CPU) @ 0
+.83/s (n=5)
ys: 7 wallclock secs ( 6.00 usr + 0.00 sys = 6.00 CPU) @ 0
+.83/s (n=5)
s/iter jc mk ys
jc 3.42 -- -65% -65%
mk 1.20 185% -- -0%
ys 1.20 185% 0% --
The = () optimization does not seem to make much difference.
| [reply] [d/l] [select] |
|
|
|
As per ccn's comment below, @unique{@a} = 1 looks weird and effectively equals to this: $unique{$a[0]} = 1; undef(@unique{@a[1..$#a]});, which, BTW, is good, as, according to tilly, storing lots of undefs instead of 1's in a hash to create keys is a Good Thing(tm) for performance.
| [reply] [d/l] [select] |
Re: Find unique elements in an array
by jdporter (Paladin) on Apr 04, 2004 at 17:45 UTC
|
| [reply] [d/l] |
Re: Find unique elements in an array
by dragonchild (Archbishop) on Apr 05, 2004 at 03:14 UTC
|
My personal unique'ing function is:
sub unique { my %x; @x{@_} = @_; values %x }
The reason to use values is that I need to get the unique set from an array of objects, not scalars. (This assumes, of course, that every object has a unique stringification. As I don't overload stringification in my objects, this works. If you do overload stringification ... good luck. Seriously.)
And, the reason for not using references is that I use this function as so:
my @final =
map {
# Something
} unique grep {
# Something
} map {
# Something
} @start;
------
We are the carpenters and bricklayers of the Information Age.
Then there are Damian modules.... *sigh* ... that's not about being less-lazy -- that's about being on some really good drugs -- you know, there is no spoon. - flyingmoose
| [reply] [d/l] [select] |
Re: Find unique elements in an array
by etcshadow (Priest) on Apr 05, 2004 at 02:30 UTC
|
Not that it is any different in the manner that it works, but I've always preferred:
my @uniquelist = keys %{{map {$_=>1} @nonuniquelist}};
An anonomous hash makes it much more compact.
------------
:Wq
Not an editor command: Wq
| [reply] [d/l] [select] |
|
It's compact, but far and away the slowest method.
However, you can combine elements of yours with tilly's performance trick and get something that is both fairly compact and almost as quick.
Examine what is said, not who speaks.
"Efficiency is intelligent laziness." -David Dunham
"Think for yourself!" - Abigail
| [reply] [d/l] |