Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery

Re: Fuzzy String matching with index?

by iankorf (Acolyte)
on Sep 06, 2006 at 23:27 UTC ( #571564=note: print w/replies, xml ) Need Help??

in reply to Fuzzy String matching with index?

Inexact string matching is still an active field of research both on the algorithmic and statistical fronts. Some of the more popular algorithms in bioinformatics index the strings and use heuristics to speed up the search. There are also optimizations tied to particular hardware features such as vector units and FPGAs. Most of the programs (eg. BLAST, BLAT, CrossMatch, Exonerate, FASTA, etc.) are written in C with a few exceptions (eg. PatternHunter uses Java).

The concept of an edit distance is useful if you assume that all symbols are equally probable and substitutable, and there are no context dependencies. Biological sequences are not strings, however, and if one is looking for something biological, it often behooves one to use a stochastic grammar rather than a regular one. This is why hidden Markov models have become so popular. In protein space, HMMER is a popular software package (also written in C). Not many people use HMMs in nucleotide space, but they should.

These books might help

"Biological Sequence Analysis: A Probabilistic Approach": Durbin, Eddy, Krogh, and Mitchison, Cambridge University Press
"BLAST": Korf, Yandell, and Bedell, O'Reilly & Associates
"Algorithms on Strings, Trees, and Sequences": Gusfield, Cambridge University Press

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://571564]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others cooling their heels in the Monastery: (2)
As of 2022-11-26 15:51 GMT
Find Nodes?
    Voting Booth?