Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??
If I understand correctly your updates, you are looking for approximate matching (aka fuzzy matching). This is much more complicated than just using regular expressions. I'll just throw here a few pointers. Fuzzy matching often implies computing the Hamming distance or the Levenshtein edit distance (look up these names), which both basically reflect the number of differences (or mismatches) between two strings.

You might also look for the Baeza-Yates and Gonnet algorithm (sometimes also called the Baeza-Yates k-mismatches algorithm. Another known algorithm of possible interest is the Wu-Mamber k-differences. You might also take a look at the longest common subsequence problem.

The String::Approx CPAN module might be of some interest.

Finally I would suggest you take a deep look at the http://www.bioperl.org/wiki/Main_Page/. I am not using it personally (as I said, I am not a biologist) and can't say much about it, but it is quite big and is likely to have implemented some of the things you are trying to do. Also the functionalities in there are likely to be optimized for alphabets of 4 letters such as DNA sequences.


In reply to Re: Random shuffling by Laurent_R
in thread Random shuffling by onlyIDleft

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others contemplating the Monastery: (4)
As of 2024-04-25 12:57 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found