in reply to Fuzzy String matching with index?
The concept of an edit distance is useful if you assume that all symbols are equally probable and substitutable, and there are no context dependencies. Biological sequences are not strings, however, and if one is looking for something biological, it often behooves one to use a stochastic grammar rather than a regular one. This is why hidden Markov models have become so popular. In protein space, HMMER is a popular software package (also written in C). Not many people use HMMs in nucleotide space, but they should.
These books might help
"Biological Sequence Analysis: A Probabilistic Approach": Durbin, Eddy, Krogh, and Mitchison, Cambridge University Press
"BLAST": Korf, Yandell, and Bedell, O'Reilly & Associates
"Algorithms on Strings, Trees, and Sequences": Gusfield, Cambridge University Press