XP is just a number | |
PerlMonks |
comment on |
( [id://3333]=superdoc: print w/replies, xml ) | Need Help?? |
If I understand correctly your updates, you are looking for approximate matching (aka fuzzy matching). This is much more complicated than just using regular expressions. I'll just throw here a few pointers. Fuzzy matching often implies computing the Hamming distance or the Levenshtein edit distance (look up these names), which both basically reflect the number of differences (or mismatches) between two strings.
You might also look for the Baeza-Yates and Gonnet algorithm (sometimes also called the Baeza-Yates k-mismatches algorithm. Another known algorithm of possible interest is the Wu-Mamber k-differences. You might also take a look at the longest common subsequence problem. The String::Approx CPAN module might be of some interest. Finally I would suggest you take a deep look at the http://www.bioperl.org/wiki/Main_Page/. I am not using it personally (as I said, I am not a biologist) and can't say much about it, but it is quite big and is likely to have implemented some of the things you are trying to do. Also the functionalities in there are likely to be optimized for alphabets of 4 letters such as DNA sequences. In reply to Re: Random shuffling
by Laurent_R
|
|