http://qs321.pair.com?node_id=853819

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

HI All,
I would be greatful if you could share me your knowledge in parsing the blastn file. I have a blastn output file, which is something like this
ALIGNMENTS >lcl|14079 ref|NC_000009.11|:4900000-5300000 Homo sapiens chromosome 9 +, GRCh37 primary reference assembly Length=400001 Score = 270 bits (146), Expect = 2e-74 Identities = 148/149 (99%), Gaps = 0/149 (0%) Strand=Plus/Minus Query 1 TGGGCAAGGACTTCATGTCTAAAACACCAAAAGCAATGGCAACAAAAGCCAAAATT +GACA 60 |||||||||||||||||||||||||||||||||||||||||||||||||||||||| +|||| Sbjct 48784 TGGGCAAGGACTTCATGTCTAAAACACCAAAAGCAATGGCAACAAAAGCCAAAATT +GACA 48725 Query 61 AATGGGATCTAATTAAACTAAAGAGCTTCTGCACAGCAAAAGAAACTACCATCAGA +GTGA 120 |||||||||||||| ||||||||||||||||||||||||||||||||||||||||| +|||| Sbjct 48724 AATGGGATCTAATTCAACTAAAGAGCTTCTGCACAGCAAAAGAAACTACCATCAGA +GTGA 48665 Query 121 ACAGGCAACCTACAGAATGGGAGAACATT 149 ||||||||||||||||||||||||||||| Sbjct 48664 ACAGGCAACCTACAGAATGGGAGAACATT 48636

I would like to create a summary of the position of mismatch, and the type (insertion/deletion) from the blast output. In this case, the position 75, and the alleles A-C.

I tried with BioSearchIO, which parses the percentage, start and the end position of the alignment.Obviously, I dont want to have them in my summary rather the corodinates of mismatch, type of variation. Have any one know about any modules in perl/or a simpler(even harder) way of finding the position of mismatch and the type of variation?