patric has asked for the wisdom of the Perl Monks concerning the following question:
dear all,
i am having a weird problem. i am comparing a specific column between two large files. file1 is a database text file. and file2 is a query text file. i want to compare a particular column (which has alphabets) from file2 to file1. final results should have all the lines in file2, with an extra added column from file1. for example.
if you notice keenly, these two lines are missing in my output:file1.db: zm811463 1190 31050 - A/G/T/C zm811462 1190 31051 - C/T zm1829427 1190 31789 + A/G zm445312 1190 31883 - A/G zm5377419 1190 32419 + A/C zm1052506 1190 32829 + C/G zm1052507 1190 32886 + A/C/T zm9115338 1190 33832 + A/G/CC file2.query 1190 31277 A > T 1 0 0 1190 31607 C > A 0 3 1 1190 31629 C > T 0 2 0 1190 31789 A > G 1 2 5 1190 31882 A > C 0 4 0 1190 31883 T > A 0 4 0 1190 31883 T > C 2 2 5 1190 32199 C > T 0 1 1 1190 32487 T > C 0 1 1 1190 32496 A > G 0 3 0 output which i am getting now: 1190 31277 A > T 1 0 0 - 1190 31607 C > A 0 3 1 - 1190 31629 C > T 0 2 0 - 1190 31789 A > G 1 2 5 zm1829427 1190 31882 A > C 0 4 0 - 1190 32199 C > T 0 1 1 - 1190 32487 T > C 0 1 1 - 1190 32496 A > G 0 3 0 - Total number of HITS: 1 BUT, i want my output look like: 1190 31277 A > T 1 0 0 - 1190 31607 C > A 0 3 1 - 1190 31629 C > T 0 2 0 - 1190 31789 A > G 1 2 5 zm1829427 1190 31882 A > C 0 4 0 - 1190 31883 T > A 0 4 0 - 1190 31883 T > C 2 2 5 - 1190 32199 C > T 0 1 1 - 1190 32487 T > C 0 1 1 - 1190 32496 A > G 0 3 0 - Total number of HITS: 1
why is this happening? the program so far looks like this:1190 31883 T > A 0 4 0 - 1190 31883 T > C 2 2 5 -
using hash tie, because am dealing with large files.using @snplist because, there can be more alphabets to compare and only if both(seperated by ">") the alphabets in file2 third column is present in file1 5th column(which has many seperated by "/"), its considered as hit. plz help.thank you very much.use strict; use warnings; use DB_File; my $myhashfile = "hash.$$"; tie my %hash1, "DB_File", $myhashfile, O_RDWR|O_CREAT, 0666, $DB_HASH or die "cannot open file $myhashfile: $!"; open(OUT,">output.out")or die "can not open"; open(my $fh1, "<", "file1.db") or die "file1.db: $!"; foreach (<$fh1>){ chomp; my @dbinfo = split(/\s+/); $hash1{"$dbinfo[1]#$dbinfo[2]"} = "$dbinfo[4]##$dbinfo[0]"; } close($fh1); my $c=0; open(my $fh2, "<", "file2.query") or die "file2.query: $!"; my @snplist; foreach (<$fh2>) { chomp($_);@snplist=(); my @queryinfo = split(/[\s>]+/); my $values = $hash1{"$queryinfo[0]#$queryinfo[1]"}; my ($variant,$rs)=split("##",$values); my $flag_left=0;my $flag_right=0; @snplist=split("/",$variant); if(defined($variant)){ foreach my $lis(@snplist){ if($queryinfo[2] eq $lis){$flag_left=1;} elsif($queryinfo[3] eq $lis){$flag_right=1;} } if(($flag_left == 1) && ($flag_right == 1)){ print OUT "$_\t$rs\n"; $c++; $flag_left=0;$flag_right=0; } } else{ print OUT "$_\t-\n"; } } print OUT "\nTotal number of HITS: $c\n"; close($fh2); untie %hash1; unlink($myhashfile); $myhashfile=();
|
---|
Back to
Seekers of Perl Wisdom