comment on

this seems to do the trick

#!/usr/bin/perl -l

# file storing all legitimate words, one per line
my $DICT = qq(/usr/share/dict/words);

# terms to test for validity
my @TEST = qw(
pizzamilkshake perlmonks pearlmonks hellothere
heytherebaby hey123 timexyz whereangelsare
);

# build dict hash
my %WORDS;
open DICT, "< $DICT" or die $!;
while (<DICT>){
    chomp;            # chop off whitespace if it's there
    $WORDS{uc $_}++;    # force UC, add key to %WORDS
}
close DICT;

my (@subs, $giveup, $p1, $p2, $sub);

for my $test (@TEST){

    $test = uc $test;        # force word to UC at the beginning
    @subs = ();            # reset sub-match array
    $giveup = $p1 = $p2 = 0;    # not giving up, starting at the start

    while (
        !$giveup &&        # else we've thrown in the towel
        $p1 < length($test)    # else found match
    ){

        $p1++;
        $sub = substr $test, $p2, $p1-$p2;    # grab next substring
        #print STDERR "sub: $sub";

        if ($WORDS{$sub}){    # if it matches a legal word...
            #print STDERR "MATCH ($sub) in $test";
            push @subs, [ $p1, $p2 ];    # successful path, save it
            $p2 = $p1;    # advance p2 to the end of the current match
        } elsif ($p1 >= length($test)){ # at the end of the string wit
+h no match
            # if the entire string doesn't match a word or we have now
+here to
            # backtrack...
            if ($p2 == 0 || @subs == 0){
                #print STDERR "giving up on $test";
                $giveup++; # nowhere to go
            } else {
                #print STDERR "backtracking...";
                # reset p1 and p2 to last state and try to get a longe
+r match
                ($p1, $p2) = @{$subs[$#subs]};    
                pop @subs; # delete last item... it's path is a dead e
+nd
            }
        }
    }
    print ("$test: " . ($giveup ? "NO" : "YES"));
}
[download]

perl -MLWP::Simple -e'getprint "http://parseerror.com/p"' |less

In reply to Re: Junk NOT words by pizza_milkshake
in thread Junk NOT words by Anonymous Monk

Are you posting in the right place? Check out Where do I post X? to know for sure.
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
Want more info? How to link or How to display code and escape characters are good places to start.


Perl Monk, Perl Meditation
	PerlMonks