Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

Remove duplicated data from array ref

by Maresia (Beadle)
on Nov 14, 2016 at 14:54 UTC ( [id://1175886]=perlquestion: print w/replies, xml ) Need Help??

Maresia has asked for the wisdom of the Perl Monks concerning the following question:

Hi monks!
I am trying unsuccessfully to remove the duplicates from this array. All values are still showing in the Dump
Is there a better way to do this?
I wish I could to this without having to add the new values to a new variable.
Here is my test code:
#!/usr/bin/perl use strict; use warnings; use Data::Dumper; my $data = [ [3328B0Z], [3328B0Z], [887ww45], [887ww45], [9988A676], [8888Q88], [11111X9], [88999S77], [88999S77], [777777f], [A84YY9], [K7788880], [K7788880], [1122222], [8888888T], [8888888T], [87HHY86], [XX11672], [XX11672], [88889999], [88888888], [1122222], ]; my @unique; foreach my $var ( @$data ){ if ( ! grep( /$var/, @unique ) ){ push( @unique, $var ); } } print Dumper \@unique;

Thanks for helping!

Replies are listed 'Best First'.
Re: Remove duplicated data from array ref
by choroba (Cardinal) on Nov 14, 2016 at 15:05 UTC
    The code doesn't compile:

    > Bareword found where operator expected at 1.pl line 9, near "3328B0Z" (Missing operator before B0Z?)

    After wrapping quotes around the barewords, I'm getting the whole list back. The problem is that $data is an array of arrays, i.e. in the loop, $var is not a string, but an array ref, so grep is trying to match /ARRAY(0xcc7cb8)/ etc.

    For cleaner solution, see uniq in List::Util .

    ($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord }map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,
Re: Remove duplicated data from array ref
by johngg (Canon) on Nov 14, 2016 at 15:08 UTC

    Not sure why you have an AoA with only single element sub-arrays but here's a way to do what you want.

    johngg@shiraz:~/perl/Monks > perl -Mstrict -Mwarnings -MData::Dumper - +E ' my $data = [ [ 123 ], [ 789 ], [ 456 ], [ 123 ], [ 543 ], ]; my @uniq = do { my %seen; grep { not $seen{ $_->[ 0 ] } ++ } @{ $data }; }; print Data::Dumper->Dumpxs( [ \ @uniq ], [ qw{ *uniq } ] );' @uniq = ( [ 123 ], [ 789 ], [ 456 ], [ 543 ] );

    I hope this is helpful.

    Update: Note that Eily's approach building a hash then using keys can also be made to work but the order of the elements is not (necessarily) preserved due to the way the hashing algorithm assigns keys to hash buckets. Because we want to preserve the array references we have to use those as the values, the keys being their content. From Perl 5.14 keys and values can take a scalar reference to a hash, as I do here.

    johngg@shiraz:~/perl/Monks > perl -Mstrict -Mwarnings -MData::Dumper - +E ' my $data = [ [ 123 ], [ 789 ], [ 456 ], [ 123 ], [ 543 ], ]; my @uniq = values { map { $_->[ 0 ] => $_ } @{ $data } }; print Data::Dumper->Dumpxs( [ \ @uniq ], [ qw{ *uniq } ] );' @uniq = ( [ 789 ], [ 543 ], [ 456 ], [ 123 ] );

    Cheers,

    JohnGG

      Thank you, it works!
Re: Remove duplicated data from array ref
by Eily (Monsignor) on Nov 14, 2016 at 15:02 UTC

    I'm not sure what your data is actually supposed to be (your declaration for $data is not valid perl, is this supposed to be an array of strings?) but there's a simple way to get unique elements in perl: hash keys.

    my %uniq; my @data = qw(3328B0Z 3328B0Z 1122222 8888888T 3328B0Z 1122222 8888888 +T); $uniq{$_}++ for @data; my @uniq_data = keys %uniq
    You should avoid removing elements from an array while iterating over it, so outputing to another variable is probably better.
    Or, you can use the sub uniq from List::Util

      Hi, the data is just like that:
      my $data = [ ['3328B0Z'], ['3328B0Z'], ['887ww45'], ['887ww45'], ['9988A676'], ['8888Q88'], ['11111X9'], ['88999S77'], ['88999S77'], ['777777f'], ['A84YY9'], ['K7788880'], ['K7788880'], ['1122222'], ['8888888T'], ['8888888T'], ['87HHY86'], ['XX11672'], ['XX11672'], ['88889999'], ['88888888'], ['1122222'], ];

Re: Remove duplicated data from array ref
by haukex (Archbishop) on Nov 14, 2016 at 15:04 UTC

    Hi,

    You haven't told us what the problem is, but the code you posted does not compile because you're using barewords like 887ww45 instead of quoted strings, so that would be the first thing to fix - replace the square brackets with quotes (i.e. [887ww45] becomes "887ww45") and your code will work. It's not particularly efficient though, mostly because of the grep. Please see the following links:

    I wish I could to this without having to add the new values to a new variable.

    While it's possible to solve this without a second data structure, it's a bit easier to code using more than one. Why would you want to avoid a second data structure, is your actual data very big?

    Hope this helps,
    -- Hauke D

    Update: Added quote.

Re: Remove duplicated data from array ref
by hippo (Bishop) on Nov 14, 2016 at 15:02 UTC

    You have an array ref of array refs so you would need at least one more level of dereferencing to give this any chance of working. Why do you have such an odd structure to begin with? Why are the inner arrays there at all if each one consists of precisely one element?

    Update: See johngg's example below which illustrates one way of dealing with such an AoA.

Re: Remove duplicated data from array ref
by AnomalousMonk (Archbishop) on Nov 15, 2016 at 00:46 UTC
    I wish I could to this without having to add the new values to a new variable.

    There's no reason you can't do this with the code johngg provided:

    c:\@Work\Perl>perl -wMstrict -MData::Dump -le "my $data = [ [ 123 ], [ 789 ], [ 'dup' ], [ 456 ], [ 123 ], [ 'dup' ], [ 543 ], ]; dd $data; ;; $data = do { my %seen; [ grep { not $seen{ $_->[0] }++ } @$data ]; }; dd $data; " [[123], [789], ["dup"], [456], [123], ["dup"], [543]] [[123], [789], ["dup"], [456], [543]]

    The problem with this or any similar approach is that there will be a moment after the anonymous array
        [ grep { ... } @$data ]
    is built and before its reference address is taken and assigned to  $data when two possibly very large arrays (and a hash!) will exist in memory and may exhaust your system memory. (I say "possibly" because you say nothing about your actual application.)

    One way to ameliorate, but not, unfortunately, completely eliminate, this effect would be to make the input array unique "in place":

    c:\@Work\Perl>perl -wMstrict -MData::Dump -le "my $data = [ [ 123 ], [ 789 ], [ 'dup' ], [ 456 ], [ 'dup' ], [ 123 ], [ 543 ], ]; dd $data; ;; my %seen; my $lo = 0; for (my $hi = 0; $hi <= $#$data; ) { ++$seen{ $data->[$lo][0] = $data->[$hi][0] }; ++$lo; ++$hi while $hi <= $#$data && $seen{ $data->[$hi][0] }; } $#$data = $lo-1; dd $data; " [[123], [789], ["dup"], [456], ["dup"], [123], [543]] [[123], [789], ["dup"], [456], [543]]
    This leaves you with just one array to worry about in terms of memory consumption, but the hash still consumes memory, however temporarily.


    Give a man a fish:  <%-{-{-{-<

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1175886]
Approved by haukex
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others avoiding work at the Monastery: (5)
As of 2024-04-19 01:59 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found