Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??
This uses a slightly different technique that doesn't normalise anything.

The original word is read of the command line and its letters are used to create a character class based regex. The words in the dictionary files are processed one by one and matched against the regex. This acts as a filter to exclude any word that is not made up from letters in the original word with an optional 'extra letter'.

Words that pass the first filter are frequency checked to validate them for the overuse of letters before they are output. The frequency checking allows a repeat of one of the original letters for the wildcard.

use strict; use warnings; my $data = shift @ARGV; my $regex = qr /^([$data]*)([^$data]?)([$data]*)$/; my %letterfrequency; $letterfrequency{$_}++ foreach split //, $data; OUTER: while (chomp(my $word = <>)) { next unless $word =~ /$regex/i; my %frequency; my $repeat = $2 ? 0 : 1; foreach (split //, $1.$3) { if (++$frequency{$_} > $letterfrequency{$_}) { next OUTER unless $repeat --; } } print "$word\n"; }
This worked 'pretty quickly' (TM) against a 200,000 line dictionary.

In reply to Re: literati cheat / finding words from scrambled letters by inman
in thread literati cheat / finding words from scrambled letters by sulfericacid

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others admiring the Monastery: (2)
As of 2024-04-20 06:06 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found