Thank you very much! I am using the first suggestion, as I don't really understand the binary search yet, and so far it doesn't seem to have problems with searching the 500,000 rows.
I am now running into the problem of the results being output on different lines in the resulting CSV file for each print command. Here is my code:
#!/usr/bin/perl
use warnings;
use strict;
open my $GENES, '<', 'chr1data.csv' or die $!;
open my $LOCATIONS, '<', 'chr1snps.csv' or die $!;
chomp(my @locations = map { (split ',')[2] } <$LOCATIONS>);
# If IDs are not already sorted, uncomment the following line:
# @locations = sort { $a <=> $b } @locations;
for (<$GENES>) {
my ($chromosome, $start, $end) = split ',';
print "$chromosome,$start,$end";
my $idx = 0; # For $end, start searching where you left for
+ $start.
my $correction = 0; # Needed for Start(-) == Start and End(+) == E
+nd.
for my $pos ($start, $end) {
$idx++ while $locations[$idx] <= $pos - $correction
and $idx <= $#locations;
die "No numbers around $pos ($idx) \n"
if $idx == 0 or $idx > $#locations;
print ",$locations[$idx-1],$locations[$idx]";
$correction = 1;
}
print "\n";
}
Printing print ",$locations[$idx-1],$locations[$idx]"; puts this information on a new line. I'd like it to come out on the same line as print "$chromosome,$start,$end"; for each search. Do I have a \n in the wrong place?
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.