Re: A program to extract the reads and modify the seq ID

Perhaps the following will be helpful:

use strict;
use warnings;
use autodie;

my ( %ids, $id );

open my $idFH, '<', 'sample_IDs.txt';
while (<$idFH>) {
    $ids{$1} = $2 if /(.+)\s+(.+)/;
}
close $idFH;

open my $sampleFH, '<', 'sample_reads.fasta';
while (<$sampleFH>) {
    s/\n/"_weight=$ids{$id}\n"/e
      if ($id) = /^(>\S+)/ and exists $ids{$id};
    print;
}
close $sampleFH;
[download]

This uses autodie to handle file-opening exceptions. Note, also, the use of lexical (my) file handles instead of barewords.

A regex captures the ID/val pairs from the IDs file, using those to build a hash. When the sample file is processed, the ID's captured. If that ID exists as a key in the hash, the \n is replaced by the desired string. All lines are printed.

Partial output on your datasets:

...

>comp10002_c0_seq3 len=99 path=[2446:0-34 1163:35-98]
TTTTTTGTGATATATTAAATAATATATAAAAATACTATGGCAGGAAGTTTAAATAAAGTC
TTATTAATAGGCCGTTTAGGCGCAGACCCAGATATAAAA
>comp10003_c0_seq1 len=166 path=[748:0-22 1004:23-46 2527:47-165]_weig
+ht=41
AAGTAGCCTATGCGCTACAGTAAGAAAGACAGGTGAAAAAATGGAAGTAAAACAATTAGA
TGACTACTTTGGATATACAGAAAAGGGCAGTTCCTTAGAGGGGGAATTACGAGCAGGACT
AACGACATTCTTGACAATGGCGTACATTCTGTTTGTGAACCCAGAC 
>comp10004_c0_seq1 len=143 path=[2167:0-44 2322:45-68 2508:69-142]_wei
+ght=25
AATCTTTAATTTAAACTTAAAAAAAATTAACTTTTGAAAGGAATTAAAATGGAAAAAGAA
ATGTTAGTAGTAGCTAAATTAAAAGAAGGTACATTTGAAAAATTTATGGGTTTCATGCAA
TCGCCTGAAGGTTTAGCAGAAAG 
>comp10005_c0_seq1 len=135 path=[2666:0-71 4268:72-134]_weight=96
AATATTACCAGAAGTTACAGGTGATGTGACTTATTTACATTGCTTCGGTGAGTGTTCAGG
TGATGGTACAGGTGAATGCCCAAGTGGCGCTGTAACATGGATGCTTACAATGACTGTAAA
TACTGCTAATATCAC 

...
[download]

Comment on Re: A program to extract the reads and modify the seq ID Select or Download Code


Don't ask to ask, just ask
	PerlMonks