Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

Re: Searching and Coutning using 2 files with multiple columns

by kennethk (Abbot)
on Sep 16, 2009 at 14:58 UTC ( [id://795640]=note: print w/replies, xml ) Need Help??


in reply to Searching and Coutning using 2 files with multiple columns

Ignoring moritz's very good suggestion, you could perform a naive search by storing the contents of the first file in an array of arrays. This would end up looking something like (untested):

my @file1 = (); open(FILE, @ARGV[0]) || die ("could not open file @ARGV[0]\n"); while (my $line = <FILE>) { chomp $line; my ($chr, $start, $stop) = split(/\t/, $line); push @file1, [$chr, $start, $stop]; } close FILE; open(FILE, @ARGV[1])||die ("could not open file @ARGV[1]\n"); while(<FILE>){ ($Gene,$Chrom,$ModStart,$ModEnd,$Strand,$ExonCount,$SizeKB)= s +plit; # foreach (line in genes.db){ # I don't know what to put here +. foreach my $line (@file1){ my ($chr, $start, $stop) = @$line; if ($chr eq $Chrom && $start gt $ModStart && $end lt $ModE +nd){ $Count++; print ;($Gene,$Chrom,$ModStart,$ModEnd,$Strand,$ExonCount, +$SizeKB,$Count) } }

Note that your original concept had some scoping issues.

Replies are listed 'Best First'.
Re^2: Searching and Coutning using 2 files with multiple columns
by shart3 (Novice) on Sep 16, 2009 at 16:38 UTC

    Changing the Boundary.out file to:

    chr1 3204563 3661775 - chr1 3204563 3600000 - chr1 3204500 3660000 - chr1 3204000 3204001 - chr1 3204563 3760000 -

    better illustrates the point. There are 3 instances where I should get a positive result for the first line in DB.out. However, rather than counting the number of times a hit occurs, the line from DB.out is printed 3 times (each with the count value of "1"):

    Xkr4 chr1 3204562 3661779 - 3 457.217 1 Xkr4 chr1 3204562 3661779 - 3 457.217 1 Xkr4 chr1 3204562 3661779 - 3 457.217 1

    I also added a step to reset the count to zero for each loop leaving:

    my @file1 = (); open(FILE, @ARGV[0]) || die ("could not open file @ARGV[0]\n"); while (my $line = <FILE>) { chomp $line; my ($chr, $start, $stop) = split(/\t/, $line); push @file1, [$chr, $start, $stop]; } close FILE; open(FILE, @ARGV[1])||die ("could not open file @ARGV[1]\n"); while(<FILE>){ ($Gene,$Chrom,$ModStart,$ModEnd,$Strand,$ExonCount,$SizeKB)= s +plit; foreach my $line (@file1){ $Count = 0; my ($chr, $start, $stop) = @$line; if ($chr eq $Chrom && $start gt $ModStart && $end lt $ModE +nd){ $Count++; print ("$Gene\t$Chrom\t$ModStart\t$ModEnd\t$Strand\t$ExonC +ount\t$SizeKB\t$Count\n") } } }

    How do I get it to change the number to 3, and not print 3 times?

      If you want to aggregate results, you need to specify what you are aggregating by. What are you trying to count? Based on what you've written, I will guess you want to know the number of lines of Boundary.out that match each line of DB.out. You can accomplish this by just moving your counter and print statements outside of the foreach loop:

      my $Count = 0; foreach my $line (@file1){ my ($chr, $start, $stop) = @$line; if ($chr eq $Chrom && $start gt $ModStart && $end lt $ModE +nd){ $Count++; } } print ("$Gene\t$Chrom\t$ModStart\t$ModEnd\t$Strand\t$ExonCount +\t$SizeKB\t$Count\n");

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://795640]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others taking refuge in the Monastery: (6)
As of 2024-04-25 11:54 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found