"be consistent" | |
PerlMonks |
Re^2: DNA Pattern Matchingby Speed_Freak (Sexton) |
on Jul 20, 2017 at 13:40 UTC ( [id://1195610]=note: print w/replies, xml ) | Need Help?? |
Apologies, I saw you comment similarly on a thread about K-mers. (BioMonks vs PerlMonks.) Is there a specific site or section for bio based questions that I have missed? Let me see if I can wrap my head around all of this and better explain myself... I'm going to walk through your reply out loud. So the $ppi_pm_seq is defined as an example sequence. (This would be filled in from each sequence in the group I have. -250k+ sequences.) The $n is defined as 3, since we are after the middle base of the 7-mer sequence. (For me, $n would be set to 12 since all of my sequences are 25-mer.) $my_extraction looks to set the pattern of the sequence into three variables that instead of being $1, $2, and $3 by default, are now $before, $mid, and $after (I didn't know that you could define the pattern variables!) - containing the first 3 bases (any single character x 3 because of . $n), the middle base, and the last three bases? The /A designates the first pattern to match only at the beginning of the string, while the \z directs the third pattern to match only at the end of the string, correct? Is the xms required? The next block is confusing to me, and that may be my fault. So in my parent list, I have "Pair" ID's. These pair ID's each correspond to another pair of ID's. So one ID becomes two ID's which become four ID's. Out of those final four ID's, two are targets, and two are mismatches with the central base being different. That is where the ppi_pm, ppi_mm, mpi_pm, and mpi_mm come into play. pm's and mm's in the same pair differ only by the central base, while ppi's and mpi's are different sequences entirely. (Not entirely true, they are complement sequences...but that's not important for this.) For simplicity, i'm guessing it is best to ignore the identified mm sequences in this code? If I'm following along correctly, you are setting each of those variables to be the evaluation of the pattern variables (map{block}) $before and $after with the system default variable $_ serving to hold the center base for each $mid? and then following it with an evaluation where the center base is not equal to the target stored in $mid? for the list of A,T,C, and G? Is this just creating the possible permutations of the target string based on the possible outcomes? The sequences are guaranteed to only contain A,T,C, and G. And during my searches I found a couple of tools in 5.10 plus that looked useful, but alas I am stuck with what I have for the time being. (We attempted running the overarching script group on a newer deployment and couldn't make it functional. (Maybe a project for the future.) Taking a crack at the code...
In Section
Seekers of Perl Wisdom
|
|