http://qs321.pair.com?node_id=795622

shart3 has asked for the wisdom of the Perl Monks concerning the following question:

Good Morning Monks,

I have a problem that I could do rather easily in Excel, but my files are too large, and I can't seem to figure out a way for perl to do it, although I know it can.

I have two files Boundary.out and DB.out. Boundary.out look like this:

chr1 3204562 3661779 chr1 4334223 4350673 chr1 4481008 4486694 chr1 4764014 4775968 chr1 4797773 4836816 chr1 4847574 4887987 chr1 4847574 4887987 chr1 4848208 4887987 chr1 4900049 5009660 chr1 5073053 5152630

and DB.out looks like this:

Xkr4 chr1 3204562 3661779 - 3 457.217 Rp1 chr1 4334223 4350673 - 4 16.45 Sox17 chr1 4481008 4486694 - 5 5.686 Mrpl15 chr1 4764014 4775968 - 5 11.954 Lypla1 chr1 4797773 4836816 + 9 39.043 Tcea1 chr1 4847574 4887987 + 10 40.413 Tcea1 chr1 4848208 4887987 + 10 39.779 Rgs20 chr1 4900049 5009660 - 5 109.611

for each row in DB.out I want to count the number of times that a row in Boundary.out occurs where column 0 in Boundary.out eq col1 in DB.out AND col1(Boundary) gt col2(DB) AND col2(Boundary)lt col3(DB). That value should be added to col7(DB).

Here is what I have so far:

open(FILE, @ARGV[0]) || die ("could not open file @ARGV[0]\n"); while (my $line = <FILE>) { chomp $line; my ($chr, $start, $stop) = split(/\t/, $line); } close FILE; open(FILE, @ARGV[1])||die ("could not open file @ARGV[1]\n"); while(<FILE>){ ($Gene,$Chrom,$ModStart,$ModEnd,$Strand,$ExonCount,$SizeKB)= s +plit; foreach (line in genes.db){ # I don't know what to put here. if ($chr eq $Chrom && $start gt $ModStart && $end lt $ModE +nd){ $Count++; print ;($Gene,$Chrom,$ModStart,$ModEnd,$Strand,$ExonCount, +$SizeKB,$Count) } }

Can anyone help????????????