Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??
Instead of testing the string against every dictionary word, it would be better to search along the string for words that fit the current pattern. Suppose that you have some minmum word length n of dictionary words. Then at each position in the string, get that substring of length n and search your dictionary of sorted words for that pattern; this is a O(n*log(n)) operation. Then check each of the possible words against the rest of the string. For a reasonable n, this will cut down search time drastically.

This basic idea carried to its logical conclusion forms the basis for a using a trie data structure to find words fast. There is a module Text::Trie that implements this:

use Tree::Trie; use strict; my($trie) = new Tree::Trie; $trie->add(qw[aeode calliope clio erato euterpe melete melpomene mneme + polymnia terpsichore thalia urania]); my(@all) = $trie->lookup(""); my(@ms) = $trie->lookup("m"); $" = "--"; print "All muses: @all\nMuses beginning with 'm': @ms\n"; my(@deleted) = $trie->remove(qw[calliope thalia doc]); print "Deleted muses: @deleted\n";
A trie consisting of 250,000 words will take up a good deal of space, but even a truncate trie will speed things up.

-Mark


In reply to Re: Finding dictionary words in a string. by kvale
in thread Finding dictionary words in a string. by ehdonhon

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others lurking in the Monastery: (7)
As of 2024-03-28 19:28 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found