Comparing FILE1 value to FILE2 range and printing matches

edwardtickle has asked for the wisdom of the Perl Monks concerning the following question:

Hi all,

I'm very new to Perl and am working on a Bioinformatics project at University. I have FILE1 containing a list of positions, in the format:

 
99269
550
100
126477 
1700
[download]

And FILE2 in the format:

517 1878 forward
700 2500 forward
2156 3289 forward
99000 100000 forward
22000 23000 backward
[download]

I want to compare every position in FILE1 to every range in values on FILE2, and if a position falls into one of the ranges then i want to print the position, range and direction.

So my expected output would be:

99269 99000 100000 forward
550 517 1878 forward
1700 517 1878 forward
[download]

Currently it will run with no errors, however it doesn't output any information so i am unsure where i am going wrong! When i split the final 'if' rule it runs but will only work if the position is on exacly the same line as the range.

Any help would be much appreciated.

I have posted the same question on Stackoverflow as i'm after a fairly urgent answer.

My code is as follows:

 

#!/usr/bin/perl 

use strict;
use warnings;

my $outputfile = "/Users/edwardtickle/Documents/CC22CDS.txt"; 

open FILE1, "/Users/edwardtickle/Documents/CC22positions.txt" 
    or die "cannot open > CC22: $!";
    
open FILE2, "/Users/edwardtickle/Documents/CDSpositions.txt" 
    or die "cannot open > CDS: $!";    

open (OUTPUTFILE, ">$outputfile") or die "Could not open output file: 
+$! \n";

while (<FILE1>) {
    if (/^(\d+)/) {
    my $CC22 = $1;    
        
        while (<FILE2>) {
        if (/^(\d+)\s+(\d+)\s+(\S+)/) {
        my $CDS1 = $1;
        my $CDS2 = $2;
        my $CDS3 = $3;
    
            if ($CC22 > $CDS1 && $CC22 < $CDS2) {
                print OUTPUTFILE "$CC22 $CDS1 $CDS2 $CDS3\n";
                    }
                }
            }
        }
    }
    
close(FILE1);
close(FILE2);
[download]

Comment on Comparing FILE1 value to FILE2 range and printing matches Select or Download Code

Replies are listed 'Best First'.
Re: Comparing FILE1 value to FILE2 range and printing matches by RichardK (Parson) on Oct 17, 2014 at 12:13 UTC
Well, one problem is that the first pass of your while loop for FILE2 will consume all the lines in that file and leave the file handle pointing to the end of the file (i.e eof == 1). So that on the next pass there's no more data to be read, and no lines will match. A simple fix is to move the open FILE2 inside the loop so that you open it each time you need it. `while (<FILE1) { ... open (FILE2,'<',"name"); while(<FILE2>) { ... } close FILE2; }` [download] It isn't very efficient to keep reopening the same file ,and there are lots of better ways but they are more complex, and we would need to know more about your problem. e.g. how big are your files? This, Basic debugging checklist , has a number of ways you can try to understand why any code isn't doing what you expect. Using autodie saves lots of typing for simple programs like this.	[reply] [d/l]
Re^2: Comparing FILE1 value to FILE2 range and printing matches by edwardtickle (Initiate) on Oct 17, 2014 at 14:44 UTC
That's done the trick, thank you for your help! Autodie does make a lot more sense so i will use that in future.	[reply]
Re: Comparing FILE1 value to FILE2 range and printing matches by choroba (Cardinal) on Oct 17, 2014 at 10:21 UTC
Crossposted at StackOverflow. It's considered polite to inform about crossposting so people not attending both sites don't waste their time hacking a solution to a problem already solved at the other end of the internet. لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ	[reply]
Re^2: Comparing FILE1 value to FILE2 range and printing matches by edwardtickle (Initiate) on Oct 17, 2014 at 10:31 UTC
Apologies, i didn't know that, i have edited both posts to include this.	[reply]
Re: Comparing FILE1 value to FILE2 range and printing matches by biohisham (Priest) on Oct 18, 2014 at 04:09 UTC
You can read the positions into an array by themsevels, and then open the second file and iterate over the array to find lines where the positions are enclosed within the ranges. That way you open each file only once use strict; use warnings; open(my $fh1, "<","positions.txt") or die("could not open file $!\n"); my @positions; #hold the positions to be compared while(my $line=<$fh1>){ chomp $line; push @positions,$line; } open(my $fh2, "<","coords_orientation.txt") or die("could not open fil +e $!\n"); while(my $line=<$fh2>){ chomp $line; my @record=split(" ",$line); #split the coords_orientation.txt on +white space foreach my $pos (@positions){ if($pos > $record[0] && $pos <$record[1]){ print "$pos @record\n"; } } } [download] A 4 year old monk	[reply] [d/l]
Re: Comparing FILE1 value to FILE2 range and printing matches by CountZero (Bishop) on Oct 18, 2014 at 20:02 UTC
Using some modules: use Modern::Perl '2014'; use Number::Interval; use List::Util qw/first/; # FILE1 data emulation my @FILE1 = qw/99269 550 100 126477 1700/; my @interval_objects; while (<DATA>) { chomp; my ($start, $end, undef) = split; push @interval_objects, Number::Interval->new( IncMax => 0, IncMin => 0, Min => $start, Max => $end, ); } for my $datapoint (@FILE1) { my $found = first {$_->contains($datapoint)} @interval_objects; say "$datapoint is in $found" if $found; } # FILE2 data emulation __DATA__ 517 1878 forward 700 2500 forward 2156 3289 forward 99000 100000 forward 22000 23000 backward [download] Output: `99269 is in (99000,100000) 550 is in (517,1878) 1700 is in (517,1878)` [download] As said before, this will only work if the list of intervals is not huge. It will also only find the first interval that matches. if you want to find all intervals that match, replace the `for`-loop by: `for my $datapoint (@FILE1) { my @found = grep {$_->contains($datapoint)} @interval_objects; say "$datapoint is in @found" if @found; }` [download] CountZero A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James My blog: Imperial Deltronics	[reply] [d/l] [select]
Re: Comparing FILE1 value to FILE2 range and printing matches by Laurent_R (Canon) on Oct 18, 2014 at 11:12 UTC
The simple fix suggested by RichardK is working, but is not a very efficient way to do such a thing, as pointed by RichardK himself. It is usually better to load at least one of the files into memory (as an array, a hash or some other data structure, but, as Richard already asked, we would need to know how large your two files are in order to be able to provide more guidance on how to do it.	[reply]


Welcome to the Monastery
	PerlMonks