Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

Re^3: Most elegant way to dispose of duplicates using map

by rashley (Scribe)
on Oct 30, 2006 at 21:18 UTC ( [id://581379]=note: print w/replies, xml ) Need Help??


in reply to Re^2: Most elegant way to dispose of duplicates using map
in thread Most elegant way to dispose of duplicates using map

Just for the sake of intelectual growth, how could one tweak this to eliminate duplicates of id/versions pairs, instead of just id?
  • Comment on Re^3: Most elegant way to dispose of duplicates using map

Replies are listed 'Best First'.
Re^4: Most elegant way to dispose of duplicates using map
by johngg (Canon) on Oct 30, 2006 at 22:28 UTC
    You just need to make a key for the %seen hash that is your id and version joined together in some way. Here I join them with a colon

    use strict; use warnings; use Data::Dumper; my @partTuples = ( q{abc,1.1,apple}, # 1st element q{def,3.6,orange}, # no dups. so OK q{abc,1.5,pear}, # OK id only dup. q{abc,1.1,kiwi}, # dup. id and version q{ghi,1.2,peach}, # no dups. so OK q{xyz,1.1,plum}, # OK version only dup. ); my %seen = (); my @uniquePTs = grep {! $seen{join q{:}, $_->{id}, $_->{version}} ++} map { { id => $_->[0], version => $_->[1], classification => $_->[2] } } map { [split m{,}] } @partTuples; print Dumper(\@uniquePTs);

    The output is

    $VAR1 = [ { 'version' => '1.1', 'classification' => 'apple', 'id' => 'abc' }, { 'version' => '3.6', 'classification' => 'orange', 'id' => 'def' }, { 'version' => '1.5', 'classification' => 'pear', 'id' => 'abc' }, { 'version' => '1.2', 'classification' => 'peach', 'id' => 'ghi' }, { 'version' => '1.1', 'classification' => 'plum', 'id' => 'xyz' } ];

    Cheers,

    JohnGG

      I'd suggest using a very careful delimiter if you want to take a key and serialize it in this manner. if for instance, you used a comma, you have a key value pair of "1,2" and "3" and another of "1" and "2,3", they may evaluate the same.
        Agreed. I chose a colon in this example as there were none in any of the strings that were going to form the keys. Similarly, a simple concatenation is also potentially dangerous, e.g. "frederick" and "son" vs. "fred" and "erickson". However, I should have stressed the point so thank you for doing it for me.

        Cheers,

        JohnGG

      I really need to crack the magic map/grep code.

      I see what you're doing, and I think a colon will work for the data I'm dealing with, but if I understand this, I'll need to change the way I'm putting my original @partTuples together. This:

      @partTuples = map { my @t = split(','); {id=>$t[0], version=>$t[1], classification=>$t[2]} } @partTuples;
      Isn't working, since we're doing the mapping later on, but I'm not sure what you're code is expecting.

      Thanks for all the help.

        Although not familiar with $cgi->param(), from your OP it looked like it returned a list of strings that you assigned to an array, each string being three comma-delimited fields. I just made up some gash data that had the same structure. The code I gave goes from the array of strings though to the array of unique part tuples hashes without stopping along the way. You could even take it further by feeding the return of $cgi->param() straight into the maps, like this.

        my @uniquePTs = grep {! $seen{join q{:}, $_->{id}, $_->{version}} ++} map { { id => $_->[0], version => $_->[1], classification => $_->[2] } } map { [split m{,}] } $cgi->param('partID');

        Reading this code from the bottom up you

        1) call $cgi->param() which returns a list of strings that are passed, one at a time, into the bottom map

        2) things are passed into and out of map and grep in $_ so the bottom map takes the string passed in and splits it on commas. The resultant list is placed inside anonymous array constructors [ ... ] so a reference to the new anonymous array is passed out to the map above, again in $_

        3) in the second map the value passed in in $_ is a reference to an array so to use it we need to dereference it like $_->[0] etc. In this map we construct an anonymous hash using { ... } and populate the key/value pairs. The reference to the hash is in turn passed out to the grep

        4) in the grep we again need to dereference $_, this time to access the hash like $_->{id}. By combining the values for the "id" and "version" keys we can construct a key for the %seen hash that we use to detect duplicates. We grep out only those anonymous hashes who's "id" and "version" haven't already occurred in the %seen hash.

        5) finally, those hash references that have passed the grep are assigned to the @uniquePTs array as the grep{...} map{...} map{...} list returns a list.

        I hope I've explained this adequately but I'm rushing a bit as I have to leave for an appointment soon. If I've totally misunderstood what $cgi->param('partID'); does, let me know and I'll adjust the code.

        Cheers,

        JohnGG

        Oops, nevermind. You already took that into account.

        Thanks!

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://581379]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others meditating upon the Monastery: (4)
As of 2024-04-20 03:42 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found