Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

Re: Sorting Data By Overlapping Intervals

by hdb (Monsignor)
on Oct 31, 2013 at 09:42 UTC ( [id://1060547]=note: print w/replies, xml ) Need Help??


in reply to Sorting Data By Overlapping Intervals

I would propose two modifications. First, when you load the data from file, already extract the fourth column and store it alongside the lines:

my @SNPs = map { [ (split /\t/)[3], $_ ] } <CG>;

So each element of @SNPs is now an array reference, whose first element is the fourth column and the second element is the full line.

As the second change, in your loop over the intervals pick all elements that fall in this interval using grep and the extract the line from the array reference using map:

my @inInterval = map { $_->[1] } grep { $start <= $_->[0] and $_->[0] +<= $end } @SNPs;

All you need then is to print these lines into the relevant file.

I am not sure whether I explain this well...

Replies are listed 'Best First'.
Re^2: Sorting Data By Overlapping Intervals
by ccelt09 (Sexton) on Oct 31, 2013 at 10:59 UTC

    the logic behind this makes sense but once I have each element of  @SNPs stored as an array reference as you explained above i don't understand how to print those falling within the ranges in my second data set to a relevant file

      This is what my second proposal does. If you have the interval boundaries in variables $start and $end, then

      my @inInterval = map { $_->[1] } grep { $start <= $_->[0] and $_->[0] +<= $end } @SNPs;

      will filter all relevant lines for this interval. You would just print OUT @inInterval; where OUT is the file handle for the file corresponding to this interval.

      Something like this:

      open my $CG, "<", $cg_input or die "can't open $cg_input\n"; my @SNPs = map { [ (split /\t/)[3], $_ ] } <$CG>; close($CG); open my $INTERVAL, "<", $input_interval or die "can't open $input_inte +rval\n"; my $interval = <$INTERVAL>; # skip first line foreach (<$INTERVAL>){ chomp; my( $start, $end ) = split /\t/; open my $OUT, ">", $output_directory."temp_file_".$count++.".txt"; + print $OUT map { $_->[1] } grep { $start <= $_->[0] and $_->[0] <= + $end } @SNPs; close $OUT; } close($INTERVAL);

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1060547]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others surveying the Monastery: (3)
As of 2024-04-20 10:58 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found