Re: pattern matching to separate data

Previous replies have given you a working solution, but in case it helps to know how the OP code went wrong:

while ($line=<FILE>){

        $hit1= $line=~ /^(>data_\d+\s+GENEID_\d+.*\n.*)/s;
        print OUT1 "$hit1\n";

        $hit2= $line=~ /^(>data_\d+\s+PROTID_\d+.*\n.*)/s;
        print OUT2 "$hit2\n";
}
[download]

The problems are:

The while loop is reading one line at a time, and printing to both output files on every iteration.
The input is structured as multi-line records, and the criteria for selecting the correct output file is only present on the first line of each record, so you would need to maintain a "state" variable (or use a variable for the output file handle, and assign it properly on reading the first line of each multi-line record) -- but your loop is pretending that every line contains the criteria for deciding which output to use.
You are using capturing parens in your regex match, but assigning the result to a scalar variable in a scalar context, which means the value assigned will be the number of captured strings (i.e. 1 or 0, depending on which line was just read). Note the following difference between assigning the match return in a scalar context ($c) versus a list context (@m, or $m in parens)
```
    $str = "text with some pattern in it";
    $c = $str =~ / (some pattern) /;  # sets $c to the numeric value "
+1"
    @m = $str =~ / (some pattern) /;  # assigns "some pattern" as sole
+ element of @m
    ( $m ) = $str =~ / (some pattern) /; # sets $m to "some pattern"
[download]
```

The result of those points taken together is that both your output files had the same line count as your input file, and the content of those lines is either "1" or "0". (When you said your "results are giving only the headers...", I suspect that you were looking at data that was not created by the code you posted.)

Comment on Re: pattern matching to separate data Select or Download Code


"be consistent"
	PerlMonks