Perl: the Markov chain saw | |
PerlMonks |
substr is behaving differently with small vs large stringsby wadunn (Initiate) |
on Jul 07, 2004 at 20:39 UTC ( [id://372538]=perlquestion: print w/replies, xml ) | Need Help?? |
wadunn has asked for the wisdom of the Perl Monks concerning the following question:
Hello all.
I am still a pretty green Perl user, but have a few functional programs under my belt. I am using Perl to crunch DNA sequences for my lab. We have been manually looking for a group of genes in the African malaria mosquito and want to send off the coordinates that we discovered to the powers that be. I thought that I would write a program that took a giant sequence of DNA (called a contig) and extracted the genes from it based on the coordinates that we found in order to double check each set of locations before we sent them off. For those non biologists, the search file is basically a large text file and I want my program to go so many letters from the start, take a given number of letters after that and place them in a variable (think of this as a word). Then go further downstream and pick another word, and then one more. Then I concatenate the words into a sentence (the final gene). I was not anticipating a difficult program. I thought I would use ‘substr’ because its seems tailor made for this. When I tried it on a test file (660 letters) using three calls to substr and placing the three ‘words’ into separate variables, everything behaved beautifully. Problems arose when I went to the actual search file (2,082,241 letters). With the exact same script my extracted substrings were being pulled from the sequence AHEAD of the target sequence as confirmed by my manual location of the coordinates that the program was given. Am I missing something about the behavior of substr that it would act differently on a VERY large string? I know that gigantic strings are memory blackholes but at my current level of expertise I don’t know how to avoid loading the whole file. I don’t care how slow it runs, I can run the thing overnight (eventually will be unleashed on about 150 genes); it just needs to work. I know that your job is not to look over idiot programmers’ code but I included mine incase it helps you see what I am trying to do or spot any mistake I may have over looked. Thank you so much and I will be praying to the perl gods until your comments come. Augustine This is my test data that works:
Back to
Seekers of Perl Wisdom
|
|