Re: Using hash keys to separate data

in reply to Using hash keys to separate data

Nearly there. :-)

#!/usr/bin/perl

use warnings; 
use strict;

open(KEY, "<hashKey.txt") 
    or die "error reading key list";
open(REG, "<testReg.txt") 
    or die "error reading file";

my %Chr;
while (my $key = <KEY>) {
    chomp $key;
    $Chr{$key} = undef;
}

my %R;
while (my $reg = <REG>) {
    chomp $reg;
    my @reg_split = split("\t", $reg);
    push @{$R{$reg_split[0]}}, $reg;
}

foreach my $key (sort keys %R) {
  next unless exists $Chr{$key};
  for my $out (@{$R{$key}}){ 
    print "$out\n";
  }
  print q{-} x 20, qq{\n};

}

close(KEY);
close(REG);
[download]

chr1    100    159    0
chr1    200    260    0
chr1    500    750    0
--------------------
chr11    679    687    0
--------------------
chr22    100    200    0
chr22    300    400    0
--------------------
chr3    450    700    0
--------------------
chr4    100    300    0
--------------------
chr7    350    600    0
--------------------
chr9    100    125    0
--------------------
[download]

The first while loop creates a lookup table (%Chr). The source file only has 1 field per record so there is no need for the split.

The second while loop creates a hash of arrays (%R) from your input file. The key is the first field (chromosome) and the value is an array of records. That's what the push is doing.

Finaly we print the records for each chromosome if it exists in the lookup table. In your case you want to print to a file rather than STDOUT as we do here.

As an aside, you could rewrite the first while loop with map.

Hope that helps.

Update
Reading your question again I see

hashKey.txt gives a list of all the possible chromosome values there could be in a given input file.

If that is the case why do you need the lookup table? I could see it being useful if there could be values in your input that you weren't interested in.

Comment on Re: Using hash keys to separate data Select or Download Code

In Section Seekers of Perl Wisdom