comment on

Try this. I project that it should complete your 410 billion comparisons in a little under 10 hours.

The main attempt at efficiency here is to invoke the regex once in global mode (/g) for each word, against a single large string containing all the words and have it return all the matches. It then filters just the matching ones for your specific exclusions.

#! perl -slw
use strict;

my @words = do{ local @ARGV = 'words.txt'; <> };
chomp @words;

my $all = join ' ', @words;

my $start = time;
my $n = 0;
for my $i ( @words ) {

    for my $j ( $all =~ m[ ([^ ]*$i[^ ]*) ]g ) {
        next
            if $j eq $i
            or $j eq "${i}s"
            or $j eq "${i}'s";
#        print "$j contains $i";
    }
}

printf STDERR "Took %d seconds for %d words\n",
    time() - $start, scalar @words;
[download]

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.

"Science is about questioning the status quo. Questioning authority".

In the absence of evidence, opinion is indistinguishable from prejudice.

In reply to Re: Words in Words by BrowserUk
in thread Words in Words by sarchasm

Are you posting in the right place? Check out Where do I post X? to know for sure.
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
Want more info? How to link or How to display code and escape characters are good places to start.


Come for the quick hacks, stay for the epiphanies.
	PerlMonks