Hello Perlmonks,
I have a file (hashKey.txt) that I would like to use as a list of hash keys in order to separate data from input files (e.g. testReg.txt).
hashKey.txt
chr10
chr10_random
chr11
chr11_gl000202_random
chr11_random
chr12
chr13
chr13_random
chr14
chr15
chr15_random
chr16
chr16_random
chr17_ctg5_hap1
chr17
chr17_gl000203_random
chr17_gl000204_random
chr17_gl000205_random
chr17_gl000206_random
chr17_random
chr18
chr18_gl000207_random
chr18_random
chr19
chr19_gl000208_random
chr19_gl000209_random
chr19_random
chr1
chr1_gl000191_random
chr1_gl000192_random
chr1_random
chr20
chr21
chr21_gl000210_random
chr21_random
chr22
chr22_h2_hap1
chr22_random
chr2
chr2_random
chr3
chr3_random
chr4_ctg9_hap1
chr4
chr4_gl000193_random
chr4_gl000194_random
chr4_random
chr5
chr5_h2_hap1
chr5_random
chr6_apd_hap1
chr6_cox_hap1
chr6_cox_hap2
chr6_dbb_hap3
chr6
chr6_mann_hap4
chr6_mcf_hap5
chr6_qbl_hap2
chr6_qbl_hap6
chr6_random
chr6_ssto_hap7
chr7
chr7_gl000195_random
chr7_random
chr8
chr8_gl000196_random
chr8_gl000197_random
chr8_random
chr9
chr9_gl000198_random
chr9_gl000199_random
chr9_gl000200_random
chr9_gl000201_random
chr9_random
chrM
chrUn_gl000211
chrUn_gl000212
chrUn_gl000213
chrUn_gl000214
chrUn_gl000215
chrUn_gl000216
chrUn_gl000217
chrUn_gl000218
chrUn_gl000219
chrUn_gl000220
chrUn_gl000221
chrUn_gl000222
chrUn_gl000223
chrUn_gl000224
chrUn_gl000225
chrUn_gl000226
chrUn_gl000227
chrUn_gl000228
chrUn_gl000229
chrUn_gl000230
chrUn_gl000231
chrUn_gl000232
chrUn_gl000233
chrUn_gl000234
chrUn_gl000235
chrUn_gl000236
chrUn_gl000237
chrUn_gl000238
chrUn_gl000239
chrUn_gl000240
chrUn_gl000241
chrUn_gl000242
chrUn_gl000243
chrUn_gl000244
chrUn_gl000245
chrUn_gl000246
chrUn_gl000247
chrUn_gl000248
chrUn_gl000249
chrX
chrX_random
chrY
hashKey.txt give a list of all the possible chromosome values there could be in a given input file
testReg.txt
chr1 100 159 0
chr1 200 260 0
chr1 500 750 0
chr3 450 700 0
chr4 100 300 0
chr7 350 600 0
chr9 100 125 0
chr11 679 687 0
chr22 100 200 0
chr22 300 400 0
testReg.txt is simply a test file I use to test the code. It includes various chromosome values along with 3 other columns of data.
My code so far:
#!/usr/bin/perl
use warnings; use strict;
my (%Chr, %R);
my (@key_split, @reg_split);
my ($reg_line);
open(KEY, "<hashKey.txt") or die "error reading key list";
open(REG, "<testReg.txt") or die "error reading file";
while (<KEY>) {
chomp;
@key_split = split("\n");
$Chr{"$key_split[0]"} = $key_split[0];
}
while (<REG>) {
chomp;
@reg_split = split("\t");
#$R{"$reg_split[0]"} = ($reg_split[0], $reg_split[1], $reg_split[2
+], $reg_split[3]);
$R{"$reg_split[0]"} = $reg_split[0];
}
foreach my $key (keys %Chr) {
if(exists($R{$key})){
print ("$R{$key}\n");
}
}
close(KEY);
close(REG);
So far, my code prints out all of the chr values in common between hashKey.txt and testReg.txt. What I would like it to do is to print each line to a separate file designated by each chromosome. For example:
chr1.out
chr1 100 159 0
chr1 200 260 0
chr1 500 750 0
chr3.out
chr3 450 700 0
chr4.out
chr4 100 300 0
chr7.out
chr7 350 600 0
chr9.out
chr9 100 125 0
chr11.out
chr11 679 687 0
chr22.out
chr22 100 200 0
chr22 300 400 0
From there I can use each separated file to sort what I need to. I suppose my main problem is trying to figure out how to have the hash variable point toward the unique line. Is what I am trying to accomplish even possible with hash table given that the key could be used for multiple lines? My main goal is to just separate each chr from the input file (testReg.txt) into separate files. If you have any suggestions please let me know.