Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

Find unique elements in an array

by jcpunk (Friar)
on Apr 04, 2004 at 07:16 UTC ( [id://342430]=CUFP: print w/replies, xml ) Need Help??

While this type of problem has been solved many times over, I believe this to be the least processor intensive solution. It creates a hash (%unique) and gives it a key equal to each value in the array by making it equal to 1. Then it takes the keys of %unique and assigns those to an array as its value. Since duplicates will just reassign 1 to the existing key this always removes every duplicate without skipping anything within the array.

Update:Ignore my code, a far better way to do it was suggested below: @unique{ @a} = 1; @awd2 = keys %unique;

my %unique; foreach my $thingy (@array_with_duplicates) { $unique{$thingy} = 1; } my @array_without_duplicates = keys %unique;

Replies are listed 'Best First'.
Re: Find unique elements in an array
by kvale (Monsignor) on Apr 04, 2004 at 07:52 UTC
    Here is a faster solution, using a hash slice:
    use Benchmark qw(:all) ; my @a; push @a, int (rand(10)) foreach 1..100; my %unique; my @awd; cmpthese($1000, { 'jc' => sub { foreach my $thingy (@a) { $unique{$thingy} = 1 +; } @awd = keys %unique; }, 'mk' => sub { @unique{ @a} = 1; @awd = keys %unique; }, });
    The results are
    Benchmark: running jc, mk for at least 3 CPU seconds... jc: 3 wallclock secs ( 3.14 usr + 0.00 sys = 3.14 CPU) @ 53 +5.67/s (n=20522) mk: 4 wallclock secs ( 3.27 usr + 0.00 sys = 3.27 CPU) @ 14 +937.00/s (n=48844) Rate jc mk jc 6536/s -- -56% mk 14937/s 129% --
    Update: $1000 is a typo should be 1000. I'll keep it as is for comparisons with the following nodes. Thanks BrowserUK and tilly!

    -Mark

      Calling undef on a hash slice is faster still.
      use Benchmark qw(:all) ; my @a; push @a, int (rand(10)) foreach 1..100; my %unique; my @awd; cmpthese($1000, { 'jc' => sub { foreach my $thingy (@a) { $unique{$thingy} = 1 +; } @awd = keys %unique; }, 'mk' => sub { @unique{ @a} = 1; @awd = keys %unique; }, 'bt' => sub { undef(@unique{@a}); @awd = keys %unique; }, });
      gives me
      Rate jc mk bt jc 16556/s -- -54% -62% mk 36080/s 118% -- -18% bt 44007/s 166% 22% --
      (Incidentally that is a truely bizarre indent style.)

        What is the purpose of using undef as the iteration count for cmpthese( $1000, ... );?

        Is that what causes the warnings?

        Use of uninitialized value in numeric gt (>) at c:/Perl/lib/Benchmark. +pm line 831. Use of uninitialized value in numeric gt (>) at c:/Perl/lib/Benchmark. +pm line 838. Use of uninitialized value in numeric eq (==) at c:/Perl/lib/Benchmark +.pm line 776. Use of uninitialized value in numeric gt (>) at c:/Perl/lib/Benchmark. +pm line 840. Use of uninitialized value in numeric gt (>) at c:/Perl/lib/Benchmark. +pm line 791. Use of uninitialized value in numeric eq (==) at c:/Perl/lib/Benchmark +.pm line 776. Use of uninitialized value in numeric gt (>) at c:/Perl/lib/Benchmark. +pm line 791. Use of uninitialized value in numeric eq (==) at c:/Perl/lib/Benchmark +.pm line 776. Use of uninitialized value in numeric gt (>) at c:/Perl/lib/Benchmark. +pm line 791. Use of uninitialized value in numeric eq (==) at c:/Perl/lib/Benchmark +.pm line 776.

        Examine what is said, not who speaks.
        "Efficiency is intelligent laziness." -David Dunham
        "Think for yourself!" - Abigail
      saying = () instead of = 1 is slightly faster, but the hash slice approach may not scale well with large @a's, since the whole array needs to be placed on the stack at once.
        If there is a stack penalty, it is not terrible, as the routine get faster with large arrays:
        use Benchmark qw(:all) ; my @a; push @a, int (rand(100)) foreach 1..2_000_000; my %unique; my (@awd1, @awd2, @awd3); cmpthese(5, { 'jc' => sub { foreach my $thingy (@a) { $unique{$thingy} = 1 +; } @awd1 = keys %unique; }, 'mk' => sub { @unique{ @a} = 1; @awd2 = keys %unique; }, 'ys' => sub { @unique{ @a} = (); @awd3 = keys %unique; }, });
        yields
        Benchmark: timing 5 iterations of jc, mk, ys... jc: 19 wallclock secs (16.75 usr + 0.35 sys = 17.10 CPU) @ 0 +.29/s (n=5) mk: 6 wallclock secs ( 6.00 usr + 0.01 sys = 6.01 CPU) @ 0 +.83/s (n=5) ys: 7 wallclock secs ( 6.00 usr + 0.00 sys = 6.00 CPU) @ 0 +.83/s (n=5) s/iter jc mk ys jc 3.42 -- -65% -65% mk 1.20 185% -- -0% ys 1.20 185% 0% --
        The = () optimization does not seem to make much difference.

        -Mark

      As per ccn's comment below, @unique{@a} = 1 looks weird and effectively equals to this: $unique{$a[0]} = 1; undef(@unique{@a[1..$#a]});, which, BTW, is good, as, according to tilly, storing lots of undefs instead of 1's in a hash to create keys is a Good Thing(tm) for performance.
Re: Find unique elements in an array
by jdporter (Paladin) on Apr 04, 2004 at 17:45 UTC
Re: Find unique elements in an array
by dragonchild (Archbishop) on Apr 05, 2004 at 03:14 UTC
    My personal unique'ing function is:
    sub unique { my %x; @x{@_} = @_; values %x }

    The reason to use values is that I need to get the unique set from an array of objects, not scalars. (This assumes, of course, that every object has a unique stringification. As I don't overload stringification in my objects, this works. If you do overload stringification ... good luck. Seriously.)

    And, the reason for not using references is that I use this function as so:

    my @final = map { # Something } unique grep { # Something } map { # Something } @start;

    ------
    We are the carpenters and bricklayers of the Information Age.

    Then there are Damian modules.... *sigh* ... that's not about being less-lazy -- that's about being on some really good drugs -- you know, there is no spoon. - flyingmoose

Re: Find unique elements in an array
by etcshadow (Priest) on Apr 05, 2004 at 02:30 UTC
    Not that it is any different in the manner that it works, but I've always preferred:
    my @uniquelist = keys %{{map {$_=>1} @nonuniquelist}};
    An anonomous hash makes it much more compact.
    ------------ :Wq Not an editor command: Wq

      It's compact, but far and away the slowest method.

      However, you can combine elements of yours with tilly's performance trick and get something that is both fairly compact and almost as quick.


      Examine what is said, not who speaks.
      "Efficiency is intelligent laziness." -David Dunham
      "Think for yourself!" - Abigail

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: CUFP [id://342430]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others meditating upon the Monastery: (3)
As of 2024-04-25 22:20 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found