Good Morning Monks,
I have a problem that I could do rather easily in Excel, but my files are too large, and I can't seem to figure out a way for perl to do it, although I know it can.
I have two files Boundary.out and DB.out. Boundary.out look like this:
chr1 3204562 3661779
chr1 4334223 4350673
chr1 4481008 4486694
chr1 4764014 4775968
chr1 4797773 4836816
chr1 4847574 4887987
chr1 4847574 4887987
chr1 4848208 4887987
chr1 4900049 5009660
chr1 5073053 5152630
and DB.out looks like this:
Xkr4 chr1 3204562 3661779 - 3 457.217
Rp1 chr1 4334223 4350673 - 4 16.45
Sox17 chr1 4481008 4486694 - 5 5.686
Mrpl15 chr1 4764014 4775968 - 5 11.954
Lypla1 chr1 4797773 4836816 + 9 39.043
Tcea1 chr1 4847574 4887987 + 10 40.413
Tcea1 chr1 4848208 4887987 + 10 39.779
Rgs20 chr1 4900049 5009660 - 5 109.611
for each row in DB.out I want to count the number of times that a row in Boundary.out occurs where column 0 in Boundary.out eq col1 in DB.out AND col1(Boundary) gt col2(DB) AND col2(Boundary)lt col3(DB). That value should be added to col7(DB).
Here is what I have so far:
open(FILE, @ARGV[0]) ||
die ("could not open file @ARGV[0]\n");
while (my $line = <FILE>) {
chomp $line;
my ($chr, $start, $stop) = split(/\t/, $line);
}
close FILE;
open(FILE, @ARGV[1])||die ("could not open file @ARGV[1]\n");
while(<FILE>){
($Gene,$Chrom,$ModStart,$ModEnd,$Strand,$ExonCount,$SizeKB)= s
+plit;
foreach (line in genes.db){ # I don't know what to put here.
if ($chr eq $Chrom && $start gt $ModStart && $end lt $ModE
+nd){
$Count++;
print ;($Gene,$Chrom,$ModStart,$ModEnd,$Strand,$ExonCount,
+$SizeKB,$Count)
}
}
Can anyone help????????????
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.