Remove duplicated data from array ref

Maresia has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Remove duplicated data from array ref by choroba (Cardinal) on Nov 14, 2016 at 15:05 UTC
The code doesn't compile: > Bareword found where operator expected at 1.pl line 9, near "3328B0Z" (Missing operator before B0Z?) After wrapping quotes around the barewords, I'm getting the whole list back. The problem is that `$data` is an array of arrays, i.e. in the loop, `$var` is not a string, but an array ref, so grep is trying to match `/ARRAY(0xcc7cb8)/` etc. For cleaner solution, see `uniq` in List::Util . ($q=q:Sq=~/;[c](.)(.)/;chr(-\|\|-\|5+lengthSq)`"S\|oS2"`map{chr \|+ord }map{substrSq`S_+\|`\|}3E\|-\|`7**2-3:)=~y+S\|`+$1,++print+eval$q,q,a, [download]	[reply] [d/l] [select]
Re: Remove duplicated data from array ref by johngg (Canon) on Nov 14, 2016 at 15:08 UTC
Not sure why you have an AoA with only single element sub-arrays but here's a way to do what you want. `johngg@shiraz:~/perl/Monks > perl -Mstrict -Mwarnings -MData::Dumper - +E ' my $data = [ [ 123 ], [ 789 ], [ 456 ], [ 123 ], [ 543 ], ]; my @uniq = do { my %seen; grep { not $seen{ $_->[ 0 ] } ++ } @{ $data }; }; print Data::Dumper->Dumpxs( [ \ @uniq ], [ qw{ uniq } ] );' @uniq = ( [ 123 ], [ 789 ], [ 456 ], [ 543 ] );` [download] I hope this is helpful. Update:* Note that Eily's approach building a hash then using keys can also be made to work but the order of the elements is not (necessarily) preserved due to the way the hashing algorithm assigns keys to hash buckets. Because we want to preserve the array references we have to use those as the values, the keys being their content. From Perl 5.14 keys and values can take a scalar reference to a hash, as I do here. `johngg@shiraz:~/perl/Monks > perl -Mstrict -Mwarnings -MData::Dumper - +E ' my $data = [ [ 123 ], [ 789 ], [ 456 ], [ 123 ], [ 543 ], ]; my @uniq = values { map { $_->[ 0 ] => $_ } @{ $data } }; print Data::Dumper->Dumpxs( [ \ @uniq ], [ qw{ *uniq } ] );' @uniq = ( [ 789 ], [ 543 ], [ 456 ], [ 123 ] );` [download] Cheers, JohnGG	[reply] [d/l] [select]
Re^2: Remove duplicated data from array ref by Anonymous Monk on Nov 14, 2016 at 15:32 UTC
Thank you, it works!	[reply]
Re: Remove duplicated data from array ref by Eily (Monsignor) on Nov 14, 2016 at 15:02 UTC
I'm not sure what your data is actually supposed to be (your declaration for $data is not valid perl, is this supposed to be an array of strings?) but there's a simple way to get unique elements in perl: hash keys. `my %uniq; my @data = qw(3328B0Z 3328B0Z 1122222 8888888T 3328B0Z 1122222 8888888 +T); $uniq{$_}++ for @data; my @uniq_data = keys %uniq` [download] You should avoid removing elements from an array while iterating over it, so outputing to another variable is probably better. Or, you can use the sub uniq from List::Util	[reply] [d/l]
Re^2: Remove duplicated data from array ref by Anonymous Monk on Nov 14, 2016 at 15:08 UTC
Hi, the data is just like that: `my $data = [ ['3328B0Z'], ['3328B0Z'], ['887ww45'], ['887ww45'], ['9988A676'], ['8888Q88'], ['11111X9'], ['88999S77'], ['88999S77'], ['777777f'], ['A84YY9'], ['K7788880'], ['K7788880'], ['1122222'], ['8888888T'], ['8888888T'], ['87HHY86'], ['XX11672'], ['XX11672'], ['88889999'], ['88888888'], ['1122222'], ];` [download]	[reply] [d/l]
Re^3: Remove duplicated data from array ref by haukex (Archbishop) on Nov 14, 2016 at 15:18 UTC
Hi Anonymous, You may wish to register an account so that you have the ability to go back and fix mistakes in any future nodes you post while logged in. See also PerlMonks for the Absolute Beginner, How do I post a question effectively? and How do I change/delete my post? Regards, -- Hauke D	[reply]
Re: Remove duplicated data from array ref by haukex (Archbishop) on Nov 14, 2016 at 15:04 UTC
Hi, You haven't told us what the problem is, but the code you posted does not compile because you're using barewords like `887ww45` instead of quoted strings, so that would be the first thing to fix - replace the square brackets with quotes (i.e. `[887ww45]` becomes `"887ww45"`) and your code will work. It's not particularly efficient though, mostly because of the grep. Please see the following links: How do I post a question effectively? Basic debugging checklist How can I remove duplicate elements from a list or array? in perlfaq4 - Update: This will show you better ways to detect duplicates using a hash. I wish I could to this without having to add the new values to a new variable. While it's possible to solve this without a second data structure, it's a bit easier to code using more than one. Why would you want to avoid a second data structure, is your actual data very big? Hope this helps, -- Hauke D Update: Added quote.	[reply] [d/l] [select]
Re: Remove duplicated data from array ref by hippo (Bishop) on Nov 14, 2016 at 15:02 UTC
You have an array ref of array refs so you would need at least one more level of dereferencing to give this any chance of working. Why do you have such an odd structure to begin with? Why are the inner arrays there at all if each one consists of precisely one element? Update: See johngg's example below which illustrates one way of dealing with such an AoA.	[reply]
Re: Remove duplicated data from array ref by AnomalousMonk (Archbishop) on Nov 15, 2016 at 00:46 UTC
I wish I could to this without having to add the new values to a new variable. There's no reason you can't do this with the code johngg provided: `c:\@Work\Perl>perl -wMstrict -MData::Dump -le "my $data = [ [ 123 ], [ 789 ], [ 'dup' ], [ 456 ], [ 123 ], [ 'dup' ], [ 543 ], ]; dd $data; ;; $data = do { my %seen; [ grep { not $seen{ $_->[0] }++ } @$data ]; }; dd $data; " [[123], [789], ["dup"], [456], [123], ["dup"], [543]] [[123], [789], ["dup"], [456], [543]]` [download] The problem with this or any similar approach is that there will be a moment after the anonymous array `[ grep { ... } @$data ]` is built and before its reference address is taken and assigned to `$data` when two possibly very large arrays (and a hash!) will exist in memory and may exhaust your system memory. (I say "possibly" because you say nothing about your actual application.) One way to ameliorate, but not, unfortunately, completely eliminate, this effect would be to make the input array unique "in place": `c:\@Work\Perl>perl -wMstrict -MData::Dump -le "my $data = [ [ 123 ], [ 789 ], [ 'dup' ], [ 456 ], [ 'dup' ], [ 123 ], [ 543 ], ]; dd $data; ;; my %seen; my $lo = 0; for (my $hi = 0; $hi <= $#$data; ) { ++$seen{ $data->[$lo][0] = $data->[$hi][0] }; ++$lo; ++$hi while $hi <= $#$data && $seen{ $data->[$hi][0] }; } $#$data = $lo-1; dd $data; " [[123], [789], ["dup"], [456], ["dup"], [123], [543]] [[123], [789], ["dup"], [456], [543]]` [download] This leaves you with just one array to worry about in terms of memory consumption, but the hash still consumes memory, however temporarily. Give a man a fish: `<%-{-{-{-<`	[reply] [d/l] [select]