Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

select only duplicate entries

by Anonymous Monk
on Aug 24, 2006 at 07:57 UTC ( [id://569311]=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hello all!
I want to know how I can select only duplicate entries in a text. For instance, assume the following text sample:
protein1 stomach protein2 head protein3 muscle protein3 heart protein3 brain protein4 leg protein5 toes protein5 mouth protein6 ear
What I want to print is in a separate file the proteins that appear once, and, in another file, the proteins that appear twice, three times etc...
Any ideas?
Thank you!

Replies are listed 'Best First'.
Re: select only duplicate entries
by ikegami (Patriarch) on Aug 24, 2006 at 08:06 UTC

    What have you tried? You haven't demonstrated any effort at solving your own problems. (I presume combine duplicate entries was also posted by you.)

    A hash keyed by protein would be useful. The values would be lists of organs. You can use split to seperate the protein from the organ.

Re: select only duplicate entries
by GrandFather (Saint) on Aug 24, 2006 at 08:30 UTC

    You may find the answers to combine duplicate entries helpful as a starting point. During building the hash take note of the number of elements in the largest array. Then iterate from 1 to number of elements. In each iteration use grep to pull out a list of the arrays containing the data for the file matching that number of elements.


    DWIM is Perl's answer to Gödel
Re: select only duplicate entries
by borisz (Canon) on Aug 24, 2006 at 08:55 UTC
    my %h; while ( defined ( $_ = <DATA> )){ chomp; my ( $k, $v) = split ' '; push @{$h{$k}}, $v; } open my $fh1, '>', '/tmp/1.txt' or die; open my $fh2, '>', '/tmp/2.txt' or die; for my $k ( sort keys %h ) { my $c = @{$h{$k}}; for ( @{$h{$k}}){ $c > 1 ? print $fh2 "$k\t$_\n" : print $fh1 "$k\t$_\n"; }} __DATA__ protein1 stomach protein2 head protein3 muscle protein3 heart protein3 brain protein4 leg protein5 toes protein5 mouth protein6 ear
    Boris

      while ( defined ( $_ = <DATA> )){
      is equivalent to
      while ( <DATA> ){

      $c > 1 ? print $fh2 "$k\t$_\n" : print $fh1 "$k\t$_\n";
      is equivalent to
      print { $c == 1 ? $fh1 : $fh2 } "$k\t$_\n";
      or do
      my $fh = $c == 1 ? $fh1 : $fh2;
      outside the loop and print to $fh.

        Thanks, I know. I try to write it simple for the newbies.
        Boris
Re: select only duplicate entries
by Mandrake (Chaplain) on Aug 24, 2006 at 09:50 UTC
    Try
    #!/usr/bin/perl -w use strict; my %hash; (!/^$/) && (push @{$hash{(split /\s+/,$_)[0]}}, (split /\s+/,$_)[1]) w +hile(<DATA>); open TMP1, '>duplicates.txt' or die; open TMP2, '>distinct.txt' or die; for my $key (keys %hash) { for (@{$hash{$key}}) { (@{$hash{$key}} > 1) ? print TMP1 "$key\t$_\n" : print TMP2 "$key\ +t$_\n" ; } } __DATA__ protein1 stomach protein2 head protein3 muscle protein3 heart protein3 brain protein4 leg protein5 toes protein5 mouth protein6 ear
    Please refer to combine duplicate entries for similar solutions.
    Thanks..

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://569311]
Approved by ysth
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chanting in the Monastery: (3)
As of 2024-04-19 02:05 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found