Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

Re: Sorting Data By Overlapping Intervals

by Eily (Monsignor)
on Oct 31, 2013 at 09:45 UTC ( [id://1060548]=note: print w/replies, xml ) Need Help??


in reply to Sorting Data By Overlapping Intervals

The problem is that your $placeholder value isn't set back to 0 when you finish testing a range, so you never go back to test the previous lines. The solution to your problem would be to add a $placeholder = 0 at the end of your foreach loop.

Now, if you want some advice on your code, first, I would have gone the other way around, have all the ranges in memory, and for each line of your data, test all ranges. And to keep track of the file that needs to be written on, an array of hashes could do the trick. Something like :

[ { start => 1, end => 1001, file => *FH1 }, { start => 500, end => 2001, file => *FH2 } ]
Except the content would have been filled by perl instead of by hand like I just did :) . Then the algorithm would have been something like :
for each line $position = getPosition() for $range in @ranges print $range->{FH} $line if (($position > $range->{start}) && ($ +position < $range->{end}))

Then, maybe you don't want to rewrite your whole code (that's why I didn't bother much), still, there are some useful things you might like to know. If you want to have a "do something for each value and stop when condition" construct in Perl, you can use the last keyword inside a foreach loop. In your case that would be :

SNP:for my $snp (@SNPs) { my @get_SNPs = split(/\t/, $snp); my $position = $get_SNPs[3]; last SNP if ($position > $end); # stop reading, we're out of r +ange ! if (($position >= $start) && ($position <= $end)) { print OUT "@get_SNPs"; } }

Also, instead of

@array = split /\t/, $string; my $v1 = $array[1]; my $v2 = $array[2];
you can simply write my (undef, $v1, $v2) = split /\t/, $string;

Replies are listed 'Best First'.
Re^2: Sorting Data By Overlapping Intervals
by ccelt09 (Sexton) on Oct 31, 2013 at 10:46 UTC

    The only problem with setting $placeholder=0 is that I would need a way to iteratively increase its value until it is greater than or equal to my next start value before the for loop is completed. Just setting that variable to 0 after the loop means my  $position value is always less than  $start another way of saying that is:

    my position variable resets to the first value of 60454 but $start and $end increase with each loop, so nothing prints after the first output file

      Oh, right, I read an elsif instead of the second if, which meant that you would only have exited the loop when $position is above range. Then resetting $placeholder to 0 would work I guess (untested). But the condition is that your input data has to be sorted (as in ordered), which it seemed to be in your sample.

      Still, you don't check that $placeholder is a valid value, if the last element of @SNP is inside one of the ranges, you'll increase $placeholder and try to access $SNP[last element+1] which would yield undef. I'm not sure you thought of that case.

      In the end, your inner loop reworked would be something like :

      # It would probably be better have # while (my $line = <CG>) # but that would mean rethinking your whole code for my $line (@SNP) { my $position = (split " ", $line)[3]; last unless $position <= $end; print OUT $line if $position > $start; }
      This is of course, completely untested :D.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1060548]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having a coffee break in the Monastery: (9)
As of 2024-03-28 18:48 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found