http://qs321.pair.com?node_id=761574

patric has asked for the wisdom of the Perl Monks concerning the following question:

Dear all, Am trying to separate the data from a file into two different files based on the matching of either "GENEID" or "PROTID". Below is the input file.
input file: >data_1 GENEID_8 1_exons 87028 - 87375 348 bp, chain - ATGCCCAAATTAGTCAACATATTGATCACTACGGAGGAAATCTTGAAGAGTTCAAGGGGC TGTCCATTTTACTTGAAGAGCCTAAAGATCAAAAAGGGTGATAATAAATCTTTAGAAGAT ATGCTCATAATTGAATCTAACCTTACGATTTCTTCTACTTCTAATTGA >data_1 PROTID_8 1_exons 87028 - 87375 115 aa, chain - KLVNILITTEEILKSSRGIVLTVEQTSSIKRKFGWKKKKVKSAKKQKRESKPKKDGPK AAEAKGKYFHYDADGHWRRNCPFYLKSLKIKKGDNKSLEDMLIIESNLTISSTSN >data_2 GENEID_12 2_exons 121021 - 121590 486 bp, chain - ATGTGGCACAACCGCCTAGGCCACATGGGTGACAAGGGGCTGAGGGAGTTGAGCAGGAGA AGACACTTCTCAGTTAAGGGGACTCCACAGCAGAATGGGATGGCCGAGAGGATGAATAGA ACACTTTTGGAAAAAGGCTCGATGCATGAGGCTGTAGGCAGAGCTTCCAAAGGCATTCTG GGTTGA >data_2 PROTID_12 2_exons 121021 - 121590 161 aa, chain - LVHTDIYFMREKSEVFTKFKIWRAEVEKEQGRSVKCLRSDNGREYTSREFQDYCEECGIR RHFSVKGTPQQNGMAERMNRTLLEKGSMHEAVGRASKGILG program written so far: #!/usr/bin/perl open(OUT1,">GENEID.out")or die "can not create new file"; open(OUT2,">PROTID.out")or die "can not create new file"; open(FILE,"input.txt")or die "can not open file"; while ($line=<FILE>){ $hit1= $line=~ /^(>data_\d+\s+GENEID_\d+.*\n.*)/s; print OUT1 "$hit1\n"; $hit2= $line=~ /^(>data_\d+\s+PROTID_\d+.*\n.*)/s; print OUT2 "$hit2\n"; } desired output: file GENEID.out: >data_1 GENEID_8 1_exons 87028 - 87375 348 bp, chain - ATGCCCAAATTAGTCAACATATTGATCACTACGGAGGAAATCTTGAAGAGTTCAAGGGGC TGTCCATTTTACTTGAAGAGCCTAAAGATCAAAAAGGGTGATAATAAATCTTTAGAAGAT ATGCTCATAATTGAATCTAACCTTACGATTTCTTCTACTTCTAATTGA >data_2 GENEID_12 2_exons 121021 - 121590 486 bp, chain - ATGTGGCACAACCGCCTAGGCCACATGGGTGACAAGGGGCTGAGGGAGTTGAGCAGGAGA AGACACTTCTCAGTTAAGGGGACTCCACAGCAGAATGGGATGGCCGAGAGGATGAATAGA ACACTTTTGGAAAAAGGCTCGATGCATGAGGCTGTAGGCAGAGCTTCCAAAGGCATTCTG GGTTGA file PROTID.out >data_1 PROTID_8 1_exons 87028 - 87375 115 aa, chain - KLVNILITTEEILKSSRGIVLTVEQTSSIKRKFGWKKKKVKSAKKQKRESKPKKDGPK AAEAKGKYFHYDADGHWRRNCPFYLKSLKIKKGDNKSLEDMLIIESNLTISSTSN >data_2 PROTID_12 2_exons 121021 - 121590 161 aa, chain - LVHTDIYFMREKSEVFTKFKIWRAEVEKEQGRSVKCLRSDNGREYTSREFQDYCEECGIR RHFSVKGTPQQNGMAERMNRTLLEKGSMHEAVGRASKGILG
my results are giving only the headers(the line which starts with >) and not the alphabetic string. can any one please correct me in which line i am going wrong in my code? thank you.